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ENTROPY AND DRIFT IN WORD HYPERBOLIC GROUPS 


SEBASTIEN GOUEZEL, FREDERIC MATHEUS, FRANgOIS MAUCOURANT 


Abstract. The fundamental inequality of Guivarc’h relates the entropy and the drift of 
random walks on groups. It is strict if and only if the random walk does not behave like 
the uniform measure on balls. We prove that, in any nonelementary hyperbolic group 
which is not virtually free, endowed with a word distance, the fundamental inequality is 
strict for symmetric measures with finite support, uniformly for measures with a given 
support. This answers a conjecture of S. Lalley. For admissible measures, this is proved 
using previous results of Ancona and Blachere-Ha'issinsky-Mathieu. For non-admissible 
measures, this follows from a counting result, interesting in its own right: we show that, 
in any infinite index subgroup, the number of non-distorted points is exponentially small. 
The uniformity is obtained by studying the behavior of measures that degenerate towards 
a measure supported on an elementary subgroup. 


1. Main results 

Let r be a finitely generated infinite group. Although the following discussion makes 
sense in a much broader context, we will assume that T is hyperbolic since all results of 
this article are devoted to this setting. There are two natural ways to construct random 
elements in T: 

• Let d be a proper left-invariant distance on T (for instance a word distance). For 
large n, one can pick an element at random with respect to the uniform measure p n 
on the ball B n = B(e,n) (where e denotes the identity of T). 

• Let p be a probability measure on T. For large n, one can pick an element at 
random with respect to the measure p* n (the n-th convolution of the measure p). 
Equivalently, let g \, <72, - ■ ■ be a sequence of random elements of T that are distributed 
independently according to p. Form the random walk X n = g\ ■ ■ ■ g n . Then the 
distribution of X n is p* n . 

From a theoretical point of view, these methods share a lot of properties. From a compu¬ 
tational point of view, the second method is much easier to implement in general groups 
since it does not require the computation of the ball B n (note however that, in hyperbolic 
groups, simulating the uniform measure is very easy thanks to the automatic structure of 
the group). It is therefore of interest to find probability measures p such that these two 
methods give equivalent results, in a sense that will be made precise below. This is the 
main question of Vershik in [VerOO]. In free groups (with the word distance coming from 
the usual set of generators), everything can be computed: if p is the uniform measure on 
the generators, then p* n and p n behave essentially in the same way. The situation is the 
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same in free products of finite groups, again thanks to the underlying tree structure. How¬ 
ever, in more complicated groups, explicit computations are essentially impossible, and it 
is expected that the methods always differ. Our main result confirms this intuition in a 
special class of groups: In hyperbolic groups which are not virtually free (i.e., there is no 
finite index free subgroup), if d is a word distance, the two methods are always different, in 
a precise quantitative way. 

Remark 1.1. We emphasize that the question really depends on the choice of the distance 
d, since the shape of the balls B n depends on d. For instance, for any symmetric probability 
measure p on T whose support is finite and generates T, there exists a distance d (called 
the Green distance, see [BHM11]) for which the measures p n and p* n behave in the same 
way. A famous open problem (to which our methods do not apply) is to understand what 
happens when T acts cocompactly on the hyperbolic space lHI fc , and the distance d is given 
by d(e, 7 ) = d H fc(0 ,7 • O ) where O is a base point in M k . In this case, it is also expected 
that the two methods are always different. Here are the main partial results in this context: 

(1) The two methods are different for some symmetric measures with finite support 
([LP07], see also Theorem 5.9 below). 

(2) If, instead of a cocompact lattice, one considers a lattice with cusps, the two methods 
are always different [GLJ93]. 

(3) If, instead of a lattice, one considers a nice dense subgroup, there exist symmetric 
measures with finite support for which the two methods are equivalent [Boul2], 

This question also makes sense in continuous time, for negatively curved manifolds. A 
conjecture of Sullivan asserts that, in this setting, the two methods coincide if and only if 
the manifold is locally symmetric, see [Led95]. 

One can give several meanings to the question “are the two methods equivalent?” Let us 
first discuss an interpretation in terms of behavior at infinity. The measures p* n converge in 
the geometric compactification TUcT to a measure poo, supported on the boundary, called 
the exit measure of the random walk, or its stationary measure. Geometrically, the random 
walk (X n ) n ^i converges almost surely to a random point on the boundary <9T, the measure 
Poo is its distribution. On the other hand, let poo be the Patterson-Sullivan measure on <9r 
associated to the distance d, constructed in [Coo93] in this context. One should think of 
it as the uniform measure on the boundary (it is equivalent to the Hausdorff measure of 
maximal dimension on the boundary, for any visual distance coming from d). The measures 
p n do not always converge to poo, but all their limit points are equivalent to p^, with a 
density bounded from above and from below (this follows from the arguments of [Coo93], 
see Lemma 2.13 below). A version of the question is then to ask if the measures poo and 
Poo are mutually singular: in this case, the random walk mainly visits parts of the groups 
that are not important from the point of view of the uniform measure. 

Another version of the same question is quantitative: Does the random walk visit parts of 
the groups that are exponentially negligible from the point of view of the uniform measure? 
This is made precise through the notions of drift and entropy. Define 

L ( m ) = ='^2l J '(s){-^gp(g)), 

<?e r 9 er 
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where \g\ = d(e,g). The quantity L(g) is the average distance of an element to the identity. 
The quantity H(g), called the time one entropy of g, is the average logarithmic weight of 
the points. They can both be finite or infinite. The functions L and H both behave in a 
subadditive way with respect to convolution: L(g\ * ^ 2 ) ^ L{hi) + L(g 2 ) and H(g\ * /. 12 ) ^ 
H(g\) + H(g 2 ). It follows that the sequences L(g* n ) and H(g* n ) are subadditive. Hence, 
the following quantities are well defined: 

(1.2) 1(g) = lim L(g* n )/n, h(g) = lim H(g* n )/n. 

They are called respectively the drift and the asymptotic entropy of the random walk. 
They also admit characterizations along typical trajectories. If L(g) is finite, then almost 
surely £(g) = lim|X„|/n. In the same way, if H(g) is finite, then almost surely h(g) = 
lim(— log g* n (X n ))/n. The most intuitive characterization of the entropy is probably the 
following one: at time n, the random walk is essentially supported by e h ^ n points (see 
Lemma 2.4 for a precise statement). Let us also define the exponential growth rate of the 
group with respect to d, i.e., 

(1.3) v = lim inf log ^ , 

n —>00 n 

where B n is the ball of radius n around e. In hyperbolic groups, it satisfies the apparently 
stronger inequality C~ 1 e nv ^ \B n \ ^ Ce nv , by [Coo93]. For large n, most points for 
g* n are contained in a ball which has cardinality at most e ( 1 + 2e )^ nv - Since the 

random walk at time n essentially visits e hn points, we deduce the fundamental inequality 
of Guivarc’h [Gui80] 

h ^ tv. 

If this inequality is an equality, this means that the walk visits most parts of the group. 
Otherwise, it is concentrated in an exponentially small subset. Another version of our main 
question is therefore: Is the inequality h ^ tv strict? 

In hyperbolic groups, it turns out that the two versions of the question are equivalent, 
at least for finitely supported measures, and that they also have a geometric interpretation 
in terms of Hausdorff dimension. If fi is a probability measure on a group, we write T+ 
for the semigroup generated by the support of /j, and T^ for the group it generates. When 
g is symmetric, they coincide. We say that fi is admissible if T+ = T. The following 
result is Corollary 1.4 and Theorem 1.5 in [BHM11] (see also [Hai’13]) when the measure is 
symmetric, and is proved in [Tanl4] when g is not necessarily symmetric and d is a word 
distance. 

Theorem 1.2. Let T be a non-elementary hyperbolic group, endowed with a left-invariant 
distance d which is hyperbolic and quasi-isometric to a word distance. Let v be the expo¬ 
nential growth rate of (T,d). Let dgr be a visual distance on 5T associated to d. Consider 
an admissible probability measure g on L, with finite support. Assume additionally either 
that the measure g is symmetric, or that the distance d is a word distance. The following 
conditions are equivalent: 

(1) The equality h = £v holds. 

(2) The Hausdorff dimension of the exit measure g^ on (dT,dg p) is equal to the Haus¬ 
dorff dimension of this space. 
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(3) The measure p^ is equivalent to the Patters on-Sullivan measure p^. 

(4) The measure p^ is equivalent to the Patterson-Sullivan measure p 0Q , with density 
bounded from above and from below. 

(5) There exists C > 0 such that, for any g 

\vd(e,g) - d^(e,g)\ ^ C, 

where d^ is the “Green distance” associated to p, i.e., ci /t (e, g) = — logP(3n, = 
g) where X n is the random walk given by p starting from the identity (it is an 
asymmetric distance in general, and a genuine distance if p is symmetric). 

The different statements in this theorem go from the weakest to the strongest: since 
entropy is an asymptotic quantity, an assumption on h seems to allow subexponential fluc¬ 
tuations, so the assumption (1) is rather weak. On the other hand, (3) says that two 
measures are equivalent, so most points are controlled. Finally, in (5), all points are uni¬ 
formly controlled. The equivalence between these statements is a strong rigidity theorem. 
The equivalence between (1) and (2) follows from a formula for the respective dimensions. 
The definition of a visual distance at infinity dgr involves a small parameter e. In terms of 
this parameter, one has HDi^poa) = h/(e£) and -H-D(poo) = HD(dT) = v/e, so that these 
dimensions coincide if and only if h = £v. 

In this theorem, the finite support assumption can be weakened to an assumption of 
superexponential moment (i.e., for all M > 0, Ejgr < oo), thanks to [Goul3]. 

The assumption that p is symmetric or that d is a word distance is probably not necessary. 
However, the most important assumption in Theorem 1.2 is admissibility: it ensures that 
the random walk can see the geometry of the whole group (which is hyperbolic). For a 
random walk living in a strict (maybe distorted) subgroup, one would not be expecting the 
same nice behavior. 

Our main theorem follows. It states that, in hyperbolic groups which are not virtually 
free, endowed with a word distance, the different equivalent conditions of Theorem 1.2 are 
never satisfied, uniformly on measures with a fixed support. 

Theorem 1.3. Let T be a hyperbolic group which is not virtually free, endowed with a word 
distance d. Let £ be a finite subset ofT. There exists c < 1 such that, for any symmetric 
probability measure p supported in £, 

h(p) < c£(p)v , 

where v is the exponential growth rate of balls in (T ,d). 

This theorem gives a positive answer to a conjecture of S. Lalley [Lall4, slide 16]. In the 
language of Vershik [VerOO], this theorem says that no finite subset of F is extremal. On 
the other hand, if one lets £ grow, h/i can converge to v: 

Theorem 1.4. Let T be a hyperbolic group, endowed with a left invariant distance d which 
is hyperbolic and quasi-isometric to a word distance. Let pi be the uniform measure on the 
ball of radius i. Then h(pi) / £(pi) —>• v, where v is the exponential growth rate of balls in 

(r ,d). 

More precisely, we prove that £{pf) ~ i and h(pf) ~ iv. The only difficulty is to prove the 
lower bound on h(pi ): since h is defined in (1.2) using a subadditive sequence, upper bounds 
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are automatic, but to get lower bounds one should show that additional cancellations do 
not happen later on. This difficulty already appears in [EK13], where the authors prove 
that the entropy depends continuously on the measure. Our proof of Theorem 1.4, given 
in Paragraph 2.5, also applies to this situation and gives a new proof of their result, under 
slightly weaker assumptions. There is nothing special about the uniform measure on balls, 
our proof also gives the same conclusion for the uniform measure on spheres, or for the 
measures Y e~ s ^5 g / Y e _s ^ when s\v. 

Our main result is Theorem 1.3. It is a consequence of the three following results. Since 
their main aim is Theorem 1.3, they are designed to handle finitely supported symmetric 
measures. However, these theorems are all valid under weaker assumptions, which we specify 
in the statements as they carry along implicit information on the techniques used in the 
proofs. 

The first result deals with admissible (or virtually admissible) measures. 

Theorem 1.5. Let T be a hyperbolic group which is not virtually free, endowed with a word 
distance. Let /i be a probability measure with a superexponential moment, such that T+ is a 
finite index subgroup ofT. Then h(p) < £(p)v. 

The second result deals with non-admissible measures. 

Theorem 1.6. Let T be a hyperbolic group endowed with a word distance. Let p be a 
probability measure with a moment of order 1 (i.e., L(p) < oo). Assume that £(p) > 0 and 
that has infinite index in T. Then h{p) < £(p)v. 

Finally, the third result is a kind of continuity statement, to get the uniformity. 

Theorem 1.7. Let T be a hyperbolic group, endowed with a left-invariant distance which is 
hyperbolic and quasi-isometric to a word distance. Let E be a subset of T which does not 
generate an elementary subgroup. There exists a probability measure /.is with finite support 
such that £{pP) > 0 and 

svip{h(p)/£(p) : p probability, Supp(/i) C E,£(/i) > 0} = h{py) / £{pt) • 

The same statement holds if the maximum is taken over symmetric probability measures, 
the resulting maximizing measure being symmetric. 

Theorem 1.3 is a consequence of these three statements. 

Proof of Theorem 1.3 using the three auxiliary theorems. As in the statement of the theo¬ 
rem, consider a finite subset E of T. If E generates an elementary subgroup of T, all 
measures supported on E have zero entropy. Hence, one can take c = 0 in the statement 
of the theorem. Otherwise, by Theorem 1.7, there exists a symmetric measure /is with 
finite support that maximizes the quantity h(p)/£(p) over p symmetric supported by E. If 
r A(s = T+ v has finite index, h(pjfi) / £(ps) < v by Theorem 1.5. If it has infinite index, the 
same conclusion follows from Theorem 1.6. □ 

The three auxiliary theorems are non-trivial. Their proofs are independent, and use 
completely different tools. Here are some comments about them. 
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• At first sight, Theorem 1.5 seems to be the most delicate (this is the only one with 
the assumption that T is not virtually free). However, this is also the setting that 
has been mostly studied in the literature. Hence, we may use several known results, 
including most notably results of Ancona [Anc87], of Blachere, Haissinsky and Math- 
ieu [BHM11] and Tanaka [Tanl4] (Theorem 1.2 above) and of Izumi, Neshveyev and 
Okayasu [INO08] on rigidity results for cocycles. The proof relies mainly on the 
fact that the word distance is integer valued, contrary to the Green distance (more 
precisely, we use the fact that the stable translation length of hyperbolic elements 
is rational with bounded denominator). 

• In Theorem 1.6, the difficulty comes from the lack of information on the subgroup T^. 
If it has good geometric properties (for instance if it is quasi-convex), one may use 
the same kind of techniques as for Theorem 1.5. Otherwise, the random walk does 
not really see the hyperbolicity of the ambient group. The fundamental inequality 
always gives h ^ tvr^, where vr is the growth rate of the subgroup T /t (for the 
initial word distance on T). If vr < v, the result follows. Unfortunately, there 
exist non-quasi-convex subgroups of some hyperbolic groups with the same growth 
as the ambient group. However, a random walk does not typically visit all points 
of T M , it concentrates on those points that are not distorted (i.e., their distances to 
the identity in T and T^ are comparable). To prove Theorem 1.6, we will show that 
in any infinite index subgroup of a hyperbolic group, the number of non-distorted 
points is exponentially smaller than e nv . 

• Theorem 1.7 is less simple than it may seem at first sight: it does not claim that //£ is 
supported by E, and indeed this is not the case in general (see Example 5.4). Hence, 
the proof is not a simple continuity argument: We need to understand precisely the 
behavior of sequences of measures that degenerate towards a measure supported on 
an elementary subgroup. The proof will show that /i£ is supported by A'-(EU{e})-il, 
where K is a finite subgroup generated by some elements in E. 

A natural question is whether Theorem 1.3 holds for non-symmetric measures. For ad¬ 
missible measures, (i.e., T+ = T), Theorem 1.5 holds. For non-symmetric measures such 
that T^ has infinite index, Theorem 1.6 applies directly. However, since ^ T+ for general 
non-symmetric measures, there is another case to consider: the case of measures [X such that 
T M = T (or T /( has finite index in T), but T+ is much smaller than T. In this case, it seems 
that our arguments do not suffice. We give in Section 6 two examples illustrating the new 
difficulties: 

(1) One can not rely on growth arguments, as for Theorem 1.6. Indeed, there are 
subsemigroups A + with bad asymptotic behavior, for instance such that lirn inf|£> n n 
A + |/|H n | = 0 and limsup|H n 0 A + |/|H n | > 0. 

(2) The arguments of Theorem 1.5 work for finitely supported measures, or for measures 
with a superexponential moment, but also more generally for measures with a nice 
geometric behavior (they should satisfy so-called Ancona inequalities). In the non- 
symmetric situation, we give in Proposition 6.2 explicit examples of (non-admissible) 
measures with an exponential moment and a very nice geometric behavior, and such 
that nevertheless h = tv. So, arguments similar to those of Theorem 1.5 can not 
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suffice, one needs a new argument that distinguishes in a finer way between measures 
with finite support and measures with infinite support. 

This article is organized as follows. In Section 2, we give more details on the notions of 
hyperbolic group, drift and entropy. We also prove Theorem 1.4 on the asymptotic entropy 
and drift of the uniform measure on large balls. The following three sections are then 
devoted to the proofs of the three auxiliary theorems. Finally, we describe in Section 6 
what can happen in the non-symmetric setting. In particular, we show that in any torsion- 
free group with infinitely many ends, there exist (non-admissible, non-symmetric) measures 
with an exponential moment satisfying h = tv. 

2. General properties of entropy and drift in hyperbolic groups 

2.1. Hyperbolic spaces. In this paragraph, we recall classical properties of hyperbolic 
spaces. See for instance [GdlH90] or [BH99]. 

Consider a metric space (X,d). The Gromov product of two points y,y' G X, based at 
xq € X, is by definition 

(2-1) (y\y')xo = ,y) + d(x 0 ,y') - d(y,y')\. 

The space (X,d) is hyperbolic if there exists 5 ^ 0 such that, for any xo, yi, 2 / 2 , 2 / 3 , the 
following inequality holds: 

(yi\y3)x 0 > m ^({yi\y2) xo , ( 2 / 212 / 3 )^ 0 ) - d. 

The main intuition to have is that, in hyperbolic spaces, configurations of finitely many 
points look like configurations in trees: for any k. for any subset F of X with cardinality at 
most k, there exists a map from F to a tree such that, for all x,y G F, 

d{x , y) — 2 k6 < d(4>(x), 4>(r/)) ^ d(x, y). 

Hence, a lot of distance computations can be reduced to equivalent computations in trees 
(which are essentially combinatorial), up to a bounded error. Up to 5, the Gromov product 
(y\y')xo i s ) tbe approximating tree, the length of the part that is common to the geodesics 
from xo to y and from xo to y 1 . 

A space ( X , d) is geodesic if there exists a geodesic between any pair of points. For such 
spaces, there is a convenient characterization of hyperbolicity. A geodesic space ( X , d) is 
hyperbolic if and only if there exists 5 ^ 0 such that its geodesic triangles are 5-thin, i.e., 
each side is included in the 5-neighborhood of the union of the two other sides. 

Assume that ( X , dx) and (Y, dy) are two geodesic metric spaces, and that they are quasi¬ 
isometric. If (X,dx) is hyperbolic, then so is ( Y,dy )• Note however that this equivalence 
only holds for geodesic spaces. 

Let ( X , d) be a geodesic hyperbolic metric space. A subset Y of X is quasi-convex if 
there exists a constant C such that, for any y,y' G Y, the geodesics from y to y' stay in the 
G-neighborhood of Y. 

We will sometimes encounter hyperbolic spaces which are not geodesic, but only quasi¬ 
geodesic: there exist constants C > 0 and A such that any two points can be joined by a 
(A, C)-quasi-geodesic, i.e., a map / from a real interval to X such that A _1 |T — t\ — C ^ 
d(f(t), ^ A|/' — t\ + C. When the space is geodesic, a quasi-geodesic stays a bounded 

distance away from a true geodesic. Most properties that hold or can be defined using 
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geodesics (for instance the notion of quasi-convexity) can be extended to this setting, simply 
replacing geodesics with quasi-geodesics in the statements. 

Let ( X , d) be a proper geodesic hyperbolic space. Its boundary at infinity 8X is by 
definition the set of geodesics originating from a base point xo, where two such geodesics 
are identified if they remain a bounded distance away. It is a compact space, which does 
not depend on xq. The space X UcLA is also compact. If X is only quasi-geodesic, all these 
definitions extend using quasi-geodesics instead of geodesics. 

Any isometry (or, more generally, quasi-isometry) of a hyperbolic space extends continu¬ 
ously to its boundary, giving a homeomorphism of dX. 

The Gromov product may be extended to X U<9X: we define as the infimum limit 

of (x n \y n ) X0 for x n and y n converging respectively to £ and y. The choice to take the infimum 
is arbitrary, one could also take the supremum or any accumulation point, those quantities 
differ by at most a constant only depending on 5. Hence, one should think of the Gromov 
product at infinity to be canonically defined only up to an additive constant. Heuristically, 
{£\ri) xo is the time after which two geodesics from xo to £ and to 77 start diverging. 

Let (A, d) be a proper geodesic (or quasi-geodesic) hyperbolic space. For any small 
enough e > 0, one may define a visual distance dgx,e on dX such that dgx, £ (£, r/) x e _£ -^ T T' ; o 
(meaning that the ratio between these quantities is uniformly bounded from above and from 
below). 

Let ( X , d ) be a proper hyperbolic metric space. One can define another boundary of 
X, the Busemann boundary (or horoboundary), as follows. Let xq be a fixed basepoint 
in X. To x € X , one associates its horofunction h x (y ) = d(y,x) — d(xo,x), normalized 
so that h x {x o) = 0. The map : x H > h x is an embedding of X into the space of 1- 
Lipschitz functions on X, with the topology of uniform convergence on compact sets. The 
horoboundary is obtained by taking the closure of 3>(X). In other words, a sequence x n € X 
converges to a boundary point if h Xn {y) converges, uniformly on compact sets. Its limit is 
the horofunction associated to the corresponding boundary point £ (it is also called the 
Busemann function associated to £). We denote by 8b X the Busemann boundary of X. 
There is a continuous projection ttb ■ 8bX —>• dX, which is onto but not injective in general. 
The boundary 8b X is rather sensitive to fine scale details of the distance d, while 8X only 
depends on its quasi-isometry class. 

Any isometry ip of X acts on horofunctions, by the formula h^i x ^{y) = h x ((p~ 1 y) — 
h x {ip~ l x o). This implies that ip extends to a homeomorphism on 8bX , given by the same 
formula /W£)(y) = h^(ip~ 1 y) — h^(ip^ 1 x o). Note that, contrary to the action on the geometric 
boundary, this only works for isometries of X, not quasi-isometries. 

2.2. Hyperbolic groups. Let T be a finitely generated group, with a finite symmetric 
generating set S. Denote by d = ds the corresponding word distance. The group T is 
hyperbolic if the metric space (T, ds) is hyperbolic. Since hyperbolicity is invariant under 
quasi-isometry for geodesic spaces, this notion does not depend on the choice of the generat¬ 
ing set S. However, if one considers another left-invariant distance on T which is equivalent 
to ds but not geodesic, its hyperbolicity is not automatic. Hence, one should postulate its 
hyperbolicity if it is needed, as in the statement of Theorem 1.2. We say that the pair (T, d) 
is a metric hyperbolic group if the group T is hyperbolic for one (or, equivalently, for any) 
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word distance, and if the distance d is left-invariant, hyperbolic, and quasi-isometric to one 
(or equivalently, any) word distance. Such a distance d does not have to be geodesic, but it 
is quasi-geodesic since geodesics for a given word distance form a system of quasi-geodesics 
for d, going from any point to any point. 

Let (r, d) be a metric hyperbolic group. The left-multiplication by elements of T is 
isometric. Hence, T acts by homeomorphisms on its compactifications T U dT and T U c^T. 
Moreover, any infinite order element g £ T acts hyperbolically on TU <9T: it has two fixed 
points at infinity g~ and g + , the points in TU <9T \ {<?“} are attracted to g + by forward 
iteration of g, and the points in T U <9T \ {g ,+ } are attracted to g~ by backward iteration of 
9- 

Definition 2.1. Consider an action of a group T on a space Z. A function cTxZaI 
is a cocycle if, for any g, h € T and any f € Z, 

(2.2) c(gh,£) = c(g,hf) + c(h,f). 

The cocycle is Holder-continuous if Z is a metric space and each function f e-x c(g,f) is 
Holder-continuous. 

There is a choice to be made in the definition of cocycles, since one may compose with g or 
g^ 1 . Our definition is the most customary. With this definition, the map cb ■ T x <9#r —X R 
given by CB(g,f) = h^(g~ x ) is a cocycle, called the Busemann cocycle. 

A subgroup H of T is nonelementary if its action on <9T does not fix a finite set. Equiva¬ 
lently, H is not virtually the trivial group or Z. We say that a probability measure g on T 
is nonelementary if the subgroup T^ generated by its support is itself nonelementary. 

Let g be a probability measure on T. Since T acts by homeomorphisms on the compact 
space <9T, it admits a stationary measure: there exists a probability measure u on dT such 
that g * u = u, i.e., h(9)9* u = v. If g is nonelement ary, this measure is unique, 

and has no atom (see [KaiOO]). It is also the exit measure of the corresponding random 
walk X n = g\ ■ ■ ■ g n \ almost every trajectory X n (ui) converges to a point X^(oj) € <9T, and 
moreover the distribution of Aqo is precisely v. 

In the same way, since T acts on <9 bT, it admits a stationary measure ub there. This 
measure is not unique in general, even if g is nonelementary. However, all such measures 
project under ttb to the unique stationary measure on <9T. 

2.3. The drift. Let (T,d) be a metric hyperbolic group. Consider a probability measure 
g on T, with finite first moment L(g) (defined in (1.1)). The drift of the random walk has 
been defined in (1.2) as £(g) = limL(^* n )/n. Let X n = g\ ■ ■ ■ g n be the position at time n 
of the random walk generated by g (where the g{ are independent and distributed according 
to g). Then, almost surely, 1(g) = lim|X n |/ra. 

The drift also admits a description in terms of the Busemann boundary. The following 
result is well-known (compare with [KL11, Theorem 18]). 

Proposition 2.2. Let (T,d) be a metric hyperbolic group. Let g be a nonelementary prob¬ 
ability measure on T with finite first moment. Let vb l> e a g-stationary measure on 8 bT. 
Then 

tfr) = I 

JFxd B r 


(2.3) 


CR(g.f) d g(g) du B (0- 
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Proof. Let X n be the position of the random walk at time n. Using the cocycle property of 
the Busemann cocycle, we have 

J c B (X n (u}),^)dP(u)di>B(0 = J c B (gi---g n ,£)dg{gi)---dg{g n )dv B (£) 

n « 

= / c B(9k,gk+i---gnf)d/j.(g k )---dg(g n )du B (f). 

k =1 

Since the measure v B is stationary, the point g k +i ■ • • g n f is distributed according to u B . 
Hence, the terms in the above sum do not depend on k. We get 

(2.4) [ c B (g,f)dg(g)dv B (f) = - f c B {X n {u),$)df(u)dv B {£). 

J rx9 s r n J 

We have \c B (X n ,f)\/n ^ \X n \/n, which converges in L 1 and almost surely to l. Hence, 
the sequence of functions c B (X n (oj),£ t )/n is uniformly integrable on Q x d B V. Moreover, 
X n converges almost surely to a point on the boundary dT, distributed according to the 
exit measure, which has no atom. It follows that, for all £, the trajectory X n (cj) converges 
almost surely to a point different from ir B (f). This implies that, almost surely, one has 
c B (X n ,f) = \X n \+0(l), giving in particular c B (X n ,f)/n —>• l. The result follows by taking 
the limit in n in the equality (2.4). □ 

This formula easily implies that the drift depends continuously on the measure, as ex¬ 
plained in [EK13]. 

Proposition 2.3. Let (T, d) be a metric hyperbolic group. Consider a sequence of probabil¬ 
ity measures with finite first moment, converging simply to a nonelementary probability 
measure g (i.egi(g) —* g{g) for all g G T ). Assume moreover that L{gf) —>• L(g). Then 

t(m) -»• %)• 

Proof. Let u.j be stationary measures for /i,; on d B T. Taking a subsequence if necessary, we 
may assume that converges to a limiting measure v. By continuity of the action on the 
boundary, it is stationary for g. 

For each g G T, the quantity fg B p c B (g, £) duj(£) converges to fg Br c B (g, £) du(^) since 
f i-A c B (g, f ) is continuous. Averaging over g (and using the assumption L(gf) —>• L(g) to 
get a uniform domination), we deduce that 

X ]pi{g) [ c B (g,Qdvi(£) ^y^gig) [ c B (g,f)dn(f). 

g£r Jd B r ger Jd B r 

Together with the formula (2.3) for the drift, this completes the proof. □ 

In this proposition, it is important that g is nonelement ary: the result is wrong otherwise. 
For instance, in the infinite dihedral group Z xi Z/2, the measures gi = (1 — 2 - *)<5( 1)0 ) + 
2 _ *h(o,i) have zero drift since the Z/2 element symmetrizes everything in Z, while the limiting 
measure g = <5(i,o) h as drift 1. The reason is the non-uniqueness of the stationary measure 
for g on the boundary. 
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2.4. The entropy. Let T be a countable group. Consider a probability measure p on F. 
with finite time one entropy H(p) (defined in (1.1)). The entropy of the random walk has 
been defined in (1.2) as h(p) = lim Ff(p* n )/n. Let X n = <j\ ■ ■ ■ g n be the position at time n 
of the random walk generated by p (where the g t are independent and distributed according 
to p). Then, almost surely, h(p) = lim(— log p* n {X n ))/n. The fundamental inequality (1.3) 
shows that if h > 0 then t > 0. 

The entropy has several equivalent characterizations. The first one is in terms of the size 
of the typical support of the random walk: This support has size roughly e hn . The following 
lemma follows from [Hail3, Proposition 1.13]. 

Lemma 2.4. Consider a probability measure p with H{p) < oo on a countable group. Let 
h = h(p) be its asymptotic entropy. Let g > 0 and e > 0. 

(1) For large enough n, there exists a subset I\ n ofT with /.i* n (K n ) ^ 1 — 77 and \K n \ ^ 
e [h+e)n' 

(2) For large enough n, there exists no subset K n of T with p* n (K n ) g and \K n \ ^ 

e {h—e)n 

Another description is in terms of the Poisson boundary of the walk. To avoid general def¬ 
initions, let us only state this description for measures on hyperbolic groups. The following 
proposition is a consequence of [KaiOO]. 

Proposition 2.5. LetT be a hyperbolic group. Let p be a nonelementary probability measure 
on r with H(p) < 00 . Let u be its unique stationary measure on cfT. Define the Martin 
cocycle on T x dr by c M {g,0 = — log(dg* l u/ du)(£). Then 

(2.5) h{p)fi [ c M {g,ff)dp{g) dv(£), 

Jrxdr 

with equality if p has a logarithmic moment. 

When /i has a logarithmic moment, this proposition has a very similar flavor to Proposi¬ 
tion 2.2 expressing the drift of a random walk. Indeed, for symmetric measures, [BHM11] 
interprets Proposition 2.5 as a special case of Proposition 2.2, for a distance d = d /t related 
to the random walk, the Green distance, which we defined in Theorem 1.2. This distance 
is hyperbolic if p is admissible and has a superexponential moment, by [Anc87, Goul3]. It 
is not geodesic in general, but this is not an issue since we were careful enough to state 
Proposition 2.2 without this assumption. The Busemann cocycle for the Green distance is 
precisely the Martin cocycle. 

An important difference between the formulas (2.3) for the drift and (2.5) for the entropy 
is that, in the latter situation, the cocycle cm depends on the measure u (and, therefore, on 
p). This makes it more complicated to prove continuity statements such as Proposition 2.3 
for the entropy. Nevertheless, Erschler and Kaimanovich proved in [EK13] that, in hyper¬ 
bolic groups, the entropy also depends continuously on the measure. As h(p) = inf Ff(p* n )/n 
by subadditivity, it is easy to prove that when p % —>• p one has lim sup h(p t ) ^ h(p). The 
main difficulty to prove the continuity is to get lower bounds. We will need a slightly 
stronger (and more pedestrian) version of the results of [EK13] to prove Theorem 1.4. Al¬ 
though our argument may seem very different at first sight from the arguments in [EK13], 
the techniques are in fact closely related (an illustration is that we can recover with our 
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techniques the result of Kaimanovich that, for measures with finite logarithmic moment, 
equality holds in (2.5), i.e., the Poisson boundary coincides with the geometric boundary, 
see Remark 2.11). Our main criterion to get lower bounds on the entropy is the following. 
We write = {g € T : \g\ € (k — 1, k]} for the thickened sphere, so that the union of these 
spheres covers the whole group. 

Theorem 2.6. Let (r, d) be a metric hyperbolic group. Let Hi be a sequence of nonelemen¬ 
tary probability measures on V with H(m) < oo. Let i/; be the unique stationary measure 
for Hi on <9r. Assume that: 

(1) The limit points of u* have no atom. 

(2) The sequence 

(2-6) h i = Yl 5Z 

k g£ § k 

tends to infinity. 

Then liminf h{fii)/hi ^ 1. 

The quantity h t can be written 

hi = X^(sO(- lo gMi(sO) - 
r k 

The first term is the time one entropy H(Hi) of the measure Hi- I n most reasonable cases, 
the second term is negligible. The theorem then states that the asymptotic entropy h(Hi) is 
comparable to the time one entropy H(hi)- In other words, if the measure is supported close 
to infinity, and sufficiently spread out in the group (this is the meaning of the assumption 
that the limit points of z/j have no atom), then there are few coincidences and the entropy 
does not decrease significantly with time. 

To prove this theorem, we will use the following technical lemma. 


Lemma 2.7. On a probability space (X, h), consider a nonnegative function f with average 
1. For any subset A of X, 

jW) ^ h(A) log J f'j - 2e -1 . 

Proof. As the function x >—?• — logx is convex, Jensen’s inequality gives f(— log/) ^ 
— log (f /). The last quantity vanishes when f f = 1. 

Let B C X. Write a = f B /d/x//x(R). The measure d h/h(B) is a probability measure 
on B, and the function f /a has integral 1 for this measure. The previous inequality gives 
f B (- l °g(f/ a )) d T/T{ B ) > 0 , that is, 

J b (- log /) d h >-h(B) log a = ~h(B) log ^ f \ + h{B) log h{B) . 

The quantity h( b ) log//(B) is bounded from below by inf[ 0 ,i] x\ogx = —e _1 . Therefore, 

l°g f) d H > -//(R)log(^/) - e^ 1 . 
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We apply this inequality to the complement A c of A. As — log (f Ac /) ^ 0, we get a lower 
bound —e^ 1 . Let us also apply this inequality to A, and add the results. We obtain 


(- log /) dfx ^ -fi(A) log / / - 2e 


IX 


-i 


□ 


We will use the notion of shadow, due to Sullivan and considered in this context by 
Coornaert [Coo93]. Let C > 0 be large enough. The shadow 0(g,C) of g € T is {£ € <9T : 
(g |£) e ^ |g| — C 1 }. In geometric terms (and assuming the space is geodesic), this is essentially 
the trace at infinity of geodesics originating from e and going through the ball B(g,C). We 
will use the following properties of shadows [Coo93]: 

(1) Their covering number is finite. More precisely, there exists D > 0 (depending on 
C ) such that, for any integer k, for any £ € dT, 

|{<7€S fc : ^eO(g,C)}\^D. 

(2) The preimages of shadows are large. More precisely, for any g > 0, there exists 
C > 0 such that, for all g € T, the complement of g^ 1 (D(g,C) has diameter at most 
g (for a fixed visual distance on the boundary). 


Proof of Theorem 2.6. Fix e > 0. As the limit points of isj have no atom, there exists g > 0 
such that any ball of radius g in <9T has measure at most e for Uj, for i large enough. We can 
then choose a shadow size C so that g~ 1 0(g,C ) has for all g a complement with diameter 
at most g. This yields Vi(g~ l O(g, C)) ^ 1 — e. 

By (2.5), the entropy of gi satisfies 


Km) >^2hi(g) 

9£ r 


log 


d g 


-i. 


d Vi 


'-(0 


d MO- 


The function = clg ^ — (g) is nonnegative and has integral 1. For any A C <9T, 
Lemma 2.7 gives 

/ jr (- log ufcr K> ) d "‘ K) 58 -^ (A > 1o(; (/ J %r® d,/< ®) - 2e_1 

= -Ui(A) log (g-'v^A)) - 2e -1 
= - fi{A) log {vi{gA)) - 2e _1 . 


Let us take A = g 1 0(g,C), so that i g(A) ^ 1 — e. Summing over g , we get 

h(m) > (1 - e)^2 tH(g)(- !°g Vi(0(g,C))) -2e~ 1 . 
sg r 


(2.7) 
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We split the sum according to the spheres E> k . Let Vi{0{g, C)), it is at most 

D since the shadows have a covering number bounded by D. We have 


g£ § k 

tn{g) 




g€ S fc 




log 


MQ(g,c)) 


+ log Sfc + log(gi(g)/m(§ k )) 


The point of this decomposition is that the function on given by (p : g i-A s^Cffi/V (§ fc ) 

has integral 1 for the probability measure gi(g)/fJ.i(S k ). By Jensen’s inequality, the integral 
of — log ip is nonnegative. This yields 

Y /h(sO(- lo g Vi(P(g,C))) ^ —gi(S> k ) log D + Y A*<(^)(—log (m(g)/m(S k )). 

g&§ k g£ § k 

Summing over k, we deduce from (2.7) the inequality 

h(m) ^ (1 - s)hi - 2e _1 - log D. 

As hi tends to infinity, this gives h(pi) ^ (1 — 2e)/q for large enough i, completing the 
proof. □ 


To apply the previous theorem, we need to estimate hi. In this respect, the following 
lemma is often useful. 


Lemma 2.8. Let Lf ^ 1. The quantity hi defined in (2.6) satisfies 

hi > Y hi(g){~ log m{g)) - log(2 + Ri). 

I ffK-Ri 

Proof. In the definition of hi, all the terms are nonnegative. Restricting the sum to those g 
with \g\ ^ Ri, we get 

*<>EE m(g)(-\og(ni(g)/ni(S k ))) 

kpRi g£ S fc 

= Y hi(g)(- l °ghi(g)) - Y w(s fc )(- lo g^(s fc ))- 

IfiRR; k^Ri 

A probability measure supported on a set with N elements has entropy at most log N. The 
number /q(S fc ) for 0 ^ k ^ Ri are not a probability measure in general, let us add a last 
atom with mass m = /q(Ufc> J R i S fc ). We are considering a space of cardinality R n + 2, hence 

m(— log m) + Y At*(S fc )(-logAtj(S fc )) ^ log(2 + R, : ), 

k^Ri 

completing the proof. □ 

Let us see how Theorem 2.6 implies a slightly stronger version of the continuity result for 
the entropy of Erschler and Kaimanovich [EK13]. 



ENTROPY AND DRIFT IN WORD HYPERBOLIC GROUPS 


15 


Theorem 2.9. Let T be a hyperbolic group. Consider a probability measure p with finite 
time one entropy and finite logarithmic moment. Let m be a sequence of probability measures 
converging simply to p with H(ni) —>• H{pf). Then h(ni) —>• h{ff). 

The assumption H(Hi) —>• H{p) ensures that there is no additional entropy in Hi coming 
from neighborhoods of infinity that would disappear in the limit. It is automatic if the 
support of Hi is uniformly bounded or if Hi satisfies a uniform L 1 domination, but it is much 
weaker. For instance, it is allowed that the Hi have no finite logarithmic moment. 

The main lemma for the proof is a lower bound on the entropy, following from Theo¬ 
rem 2.6. 

Lemma 2.10. Let T be a hyperbolic group. Consider a probability measure h with finite time 
one entropy and finite logarithmic moment. Let Hi be a sequence of measures converging 
simply to h■ Then lirninf h(ni) ^ h(n). 

Proof. Since the result is trivial if hfii) = 0, we can assume that /i(/j) > 0. 

Let e > 0. For large n, most atoms for H* n have a probability at most e -( l - £ ) nh (u)_ 
Moreover, since h has a finite logarithmic moment, \og\X n \/n tends almost surely to 0 
by [Aar97, Proposition 2.3.1]. Therefore, the set 

K n = {g : n* n (9 ) < e~^ nh ^\ \g\ < e™} 
has measure tending to 1. In particular /i* n (A n ) ^ 1 — £ for large n. We get 

£ ii* n (g)(-\o g H* n (g))> £ H* n (g)(-iogH* n (g))> £ T* n (g)(i - e)nh(») 

|c/|^e en g£K n geK n 

= H* n (K n ){ 1 - e)nh(n) ^ (1 - efnhfn)- 

For each fixed n, the measures /x* n converge to H* n when i tends to infinity. Hence, we get 
for large enough i the inequality 

£ Hl n {g){-^ogHl n {g)) > (1 -efnh(H). 

I 

Letting e tend to 0 (and, therefore, n to infinity), we deduce the existence of sequences 
Hi —> oo and £j —)• 0 such that, for any i, 

£ Ti ni (g)(~ lo S T* ni (9)) > (! - eifnihfn). 

\g\^e £ 'i ri i 

Let fii = T*i' h ■ Its stationary measure i^ is also the stationary measure of Hi , by uniqueness. 
Any limit point of is stationary for /x, and is therefore atomless since h is nonelementary 
as h(n) > 0. The assumptions of Theorem 2.6 are satisfied by the sequence Hi- Moreover, 
Lemma 2.8 yields 

hi ^ (1 - £i) 3 n.ih(H ) - ^ (1 - C , £j)n i /i( / u). 


Theorem 2.6 ensures that lirninf h(Hi)/hi ^ 1. As h(jii) = nih(ni), this gives lirninf h(ni) ^ 
}i(h) as desired. □ 
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Proof of Theorem, 2.9. For fixed n, the sequence /i* n converges simply to /i * n . Moreover, 
H(/i* n ) —>• H{n* n ) since there is no loss of entropy at infinity by assumption. Choose n 
such that H(n* n ) < n(l + e)/i(/x). We get H(fj,* n )/n ^ (1 + 2 e)h(fi) for large enough i. As 
h{ni) ^ this shows that lim sup hijii) ^ h(n) (this is the classical semi-continuity 

property of entropy, valid in any group). 

For the reverse inequality lim inf h(Hi) ^ h(/r), we apply Lemma 2.10. □ 

Remark 2 . 11 . Let h(/j,,dT ) = f rx g r (— log dgp 1 ^/ du)(£) dfi(g) du(£) where u is the sta¬ 
tionary measure for // on <9r. In general, h(//) ^ /i(/x, cT) with equality if and only if (cT, z/) 
is the Poisson boundary of (r,/x). A theorem of Kaimanovich [KaiOO] asserts that, when 
H has finite entropy and finite logarithmic moment, h(n,dT) = h(/i). We can recover this 
theorem using the previous arguments. Indeed, what the proof of Theorem 2.6 really shows 
is that lim mi h(ni, dT)/hj. ^ 1. Hence, Lemma 2.10 proves that liminf h(ni, Or) ^ h(n) if 
Hi converges simply to a measure h with a logarithmic moment. Taking Hi = T f° r all / w e 
obtain in particular h(n, dT) ^ /i(/x), as desired. 

2.5. A criterion to bound the entropy from below. In order to prove Theorem 1.4 
on the entropy of the uniform measure on balls, we want to apply Theorem 2.6. Thus, we 
need a criterion to check that limit points of stationary measures have no atom. 


Lemma 2.12. Let T be a hyperbolic group. Let Hi be a sequence of probability measures 
on T. Assume that, on the space F U dT, the sequence Hi converges to a limit v which is 
supported on HT. Assume moreover that the limit points of fii (defined by fii(g) = HiiQ -1 )) 
have no atom. Then the stationary measures v % associated to Hi also converge to v. 


Proof. We fix a word distance d on T. Let / be a continuous function on T U cT. Let us 
show that, uniformly in £ € dT, the integral f f{gf) dHi(g) is close to f f(g)dHi(g)- We 
estimate the difference as 


J(f(g£) - f{g))d-Hi{g) 


^ /I f(gO - f(g)\i((g£\g)e > C) d Hi(g) 


+ 2H/II./l((^b)e < C)dHi(g), 

where C is a fixed constant. If C is large enough, \f(x) — f(y) | ^ £ when (x\y) e > C , by 
uniform continuity of /. Hence, the first integral is bounded by e. For the second integral, 
we use the formula (gx\g) e = \g\ — (x\g~ 1 ) e , valid for any x € T (it follows readily from the 
definition (2.1) of the Gromov product). This equality does not extend to the boundary since 
the Gromov product there is only well defined up to an additive constant D. Nevertheless, 
we get (g£\g) e ^ \g\ — (£\g~ 1 ) e ~ D. Hence, the second integral is bounded by 

(2.8) Hi{g ■ H-C'-T>^(^- 1 )e}. 

If |g| is large, the points g with (£|g _1 ) e ^ | g\ — C — D are such that g belongs to a small 
neighborhood of £ in T U <9T. As the limit points of fii are supported on cT and have no 
atom, it follows that (2.8) converges to 0 when i tends to infinity, uniformly in f. 

We have proved that 


sup 


f(gQ dHi(g) 


f(g) d Hi(g) 


0 . 
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By stationarity, 

[ /(£)<M(£)= / ( [ f(gf,)dpi(g)) 

J^edr J^dr \ J J 

Combining these equations, we get f /(£)duj(£) — / f(g)dpi(g) —>■ 0. This shows that the 
limit points of Vi and pi are the same. □ 

Let us now consider the uniform measure pi on the ball of radius i, as in Theorem 1.4. 
The next lemma follows from the techniques of [Coo93]. 

Lemma 2.13. Let ( T,d ) be a metric hyperbolic group. Let pt be the uniform measure on 
the ball of radius i. Let pbe the Patterson-Sullivan of (T,d) constructed in [Coo93] (it 
is supported on 9T and atomless). Then the limit points of pi are equivalent to pwith a 
density bounded from above and from below. 

Proof. Let C be large enough. We will use the shadows 0(g, C ) as defined before the proof 
of Theorem 2.6. The main property of p^ is that it satisfies 

(2.9) K^e-^\ ^ Poo (0(g,C)) ^ K 0 e~ v ^, 

where Kq is a constant only depending on C and v is the growth of (T,d) (Proposition 6.1 
in [Coo93]). 

Let pi be the uniform measure on thickened spheres Si = {g : i ^ |p| ^ * + L}, where 
L is large enough so that the cardinality of Si grows like e w , see the proof of Theorem 
7.2 in [Coo93]. Let us push pi to a measure pi on <9T, by choosing for each g £ Si a 
corresponding point in its shadow, ft is clear that pi and /q have the same limit points, 
since the diameter of the shadows tends uniformly to 0 when i —>• oo. We will prove that 
the limit points of pi are equivalent to poo. The same result follows for p n and then p;. 

The shadows of g £ Si have a covering number which is bounded from above by a constant 
D , and from below by 1 if C is large enough. Hence, the measures pi satisfy 

tffV* < Pi(0(g,C)) ^ K ie - iv , 

for any g €E Si. This is comparable to Poo{0{g, C )) by (2.9), up to a multiplicative constant 
K‘ 2 - Consider a limit p of a sequence pi n , let us prove that it is uniformly equivalent to p Q0 . 
We will only prove that p ^ -D/I 2 P 00 the other inequality is proved in the same way. By 
regularity of the measures, it suffices to check this inequality on compact sets. 

Let A be a compact subset of 9T, and e > 0. By regularity of the measure p^, there is an 
open neighborhood U of A with PooifJ ) ^ Poo(A) +e. Consider B a compact neighborhood 
of A, included in U, with p(dB) = 0 (such a set exists, since among the sets B r = {£ : 
d{f,,A) ?’}, at most countably of them many have a boundary with nonzero measure). 

For large enough i, the shadows 0(g,C) with g € Si which intersect B are contained in U. 
Therefore, 

&(£)< E Pi(0(g,C))^K 2 E Poo(0(g,C))^DK 2Poo (U). 

g£Si,0(g,C)nB^<A g£Si.O(g,C)nB^<6 

As p(dB) = 0, the sequence Pi n {B ) tends to p(B). We obtain p(B) ^ DK- 2 Poo{U). As 
A is included in B, we get p(A) ^ DK 2 (poo{A) + e). Letting e tend to 0, this gives 
p(A) ^ D/i 2 poo(A), as desired. □ 
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Proof of Theorem l.f. Let pi be the uniform measure on the ball of radius i (which has 
cardinality in [C~ 1 e iv , Ce iv ]). We wish to apply Theorem 2.6 to this sequence of measures. 
First, by Lemmas 2.12 and 2.13, the limit points of the stationary measures u* are equivalent 
to the Patterson-Sullivan measure. Therefore, they have no atom. Second, Lemma 2.8 shows 
that the quantity hi in (2.6) satisfies hi ^ iv — logC — log(2 + i). This tends to infinity. 
Hence, Theorem 2.6 applies, and gives h(pi) ^ (1 — e)iv for large i. 

Using the fundamental inequality h ^ iv and the trivial bound i(pi) ^ L(pi) ^ i, we get 

(1 - e)iv ^ h(pi) ^ i(pi)v ^ iv. 

It follows that h(pi) ~ iv and i(pi) ~ i. □ 

Remark 2.14. Our technique also applies to estimate the entropy of other measures, for 
instance the measure p s = Y2 e~ s ^5 g /^2 e~ s ^ classically used in the construction of the 
Patterson-Sullivan measure. Indeed, p s converges when s \ v to p a 0 , which has no atom. 
Moreover, writing Z s = ^ e~ s ^, we have H(p s ) = sL(p s ) + log Z s . One checks that log Z s 
is negligible with respect to H(p s ), and that the quantity h s from (2.6) is also equivalent 
to H(p s ). Hence, Theorem 2.6 gives 

H(p s )( 1 + o(l)) ^ h s { 1 + o(l)) ^ h(p s ) ^ l{ps)v ^ L(p s )v ^ H(p s )( 1 + o(l)). 

These inequalities show that h(p s ) / i(p s ) —>• v. 

Remark 2.15. One could imagine another strategy to find finitely supported measures pi 
for which h(pi)/i{pi) —> v. First, find a nice measure p for which the stationary measure 
u at infinity is precisely the Patterson-Sullivan measure (which implies that h(p) = i(p)v 
since the Martin cocycle and the Busemann cocycle coincide). Let pi be a truncation of 
p. Since it converges to p, the continuity results for the drift and the entropy imply that 
h{pi)/i{pi) -A- h(p)/i(p) = v. 

We were not able to implement successfully this strategy. Given a measure u, there is 
a general technique due to Connell and Muchnik [CM07] to get a measure p on T with 
p * is = is. This technique requires a continuity assumption on £ i-A (dg*v/ du)(£), which is 
not satisfied in our setting for v = p^. However, in nice groups such as surface groups, this 
function is, for every g , continuous at all but finitely many points. The technique of [CM07] 
can be adapted to such a situation (in the proof of their Theorem 6.2, one should just take 
sets Y n that avoid the discontinuities of the spikes we have already used). Unfortunately, 
the resulting measure p (which satisfies p*u = u) has infinite moment and infinite entropy, 
and is therefore useless for our purposes. 

3. Rigidity for admissible measures 

In this section, we prove Theorem 1.5. Assume that (r,d) is a hyperbolic group endowed 
with a word distance, which is not virtually free. Let p be a probability measure on T, with 
a superexponential moment, such that T+ is a finite index subgroup of T. We want to prove 
that h(p) < l{p)v. We argue by contradiction, assuming that h(p) = i(p)v. Assume first 

that r+ = r. 

Since we are assuming the equality h{p) = i{p)v , Theorem 1.2 implies that 

\dn(e,g) - vd{e,g)\ ^ C. 
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As a warm-up, let us first deal with the baby case (7 = 0. Then the distances d /t and d are 
proportional, hence they define the same Busemann boundary. The Busemann boundary 
Ob T corresponding to d is totally discontinuous since the distance d takes integer values 
(it is a word distance). On the other hand, the Busemann boundary associated to the 
Green metric d^ is known as the Martin boundary of the random walk (T, fi). By [Anc87] 
and [Goul3], it is homeomorphic to the boundary 5T of T. Since the group T is not 
virtually free, its boundary <9T is not totally discontinuous (see [KB02, Theorem 8.1]), hence 
a contradiction. 

Let us now go back to the general situation, when C is nonzero (but still assuming 
T+ = r). The argument is more complicated, but it still relies on the same facts: the 
boundary is not totally disconnected, while the word distance is integer valued (we will 
not use directly this fact, rather the fact that stable translation lengths are rational, see 
Lemma 3.4). These two opposite features will give rise to a contradiction. 

In order to get rid of the constant (7, we will need an homogenized version of the inequality 
\d^(e,g) — vd(e,g)\ (7. This is Lemma 3.1 below. The homogenized quantity associated 
to the distance d is called the stable translation length. For an element g of L, it is defined 
by 1(g) = lim|g n |/w (it exists by subadditivity). 

Recall that we write ca/((/,£) f° r the Martin cocycle associated to the random walk, 
defined in Proposition 2.5. It satisfies the cocycle relation of Definition 2.1. We will not 
use its probabilistic definition, but rather the fact that the Martin cocycle is the Busemann 
cocycle associated to the Green distance d^ of Theorem 1.2. In other words, cm(ss£) = 
lim d fl (g^ 1 , x) — d^(e,x) (and this limit exists). 

Lemma 3.1. For g G T with infinite order, cm(5 M ? + ) = vfig)- 

Proof. Recall that we are assuming that the equality h(/j,) = d(g)v holds, therefore we have 
\d fl (e, g) — vd(e, g)\ (7. It follows that the cocycle cm corresponding to d M and the cocycle 

cb corresponding to the distance d satisfy |cm — vcb\ 2(7. Note that cb is not defined on 
the geometric boundary, but on the horoboundary, so the proper way to write this inequality 
is | c M (g, itb(O) ~ vc B (g , 01 ^ 2(7 for any g G T and any £ G d B V. 

Let £ € 8b T with 7 rg(£) fi g~. Then limc£(g n ,£)/n = lim lifig~ n )/n = fig). We choose 
£ with tt b (£) = g + , to get 

lim c M (g n ,g + )/n = limvc B (g n fi)/n±2C/n = vfig). 

As g + is (/-invariant, the cocycle equation for cm on <9T gives cm( 5 M? + ) = cm( g n , <? + )/ra. 
This converges to vfig) when n —>• oo by the previous equation. □ 

The proof of Theorem 1.5 uses the following general result on cocycles. 

Proposition 3.2. Let T be a hyperbolic group which is not virtually free. Let c : T x <9T -a- M 
be a Holder cocycle, such that any hyperbolic element g satisfies c(g, g + ) G Z. Then there 
exists a hyperbolic element g G T with c(g,g~) = c(g, g + ). 

Applied to the Busemann cocycle, this proposition implies that if a convex cocompact 
negatively curved manifold has a fundamental group which is not virtually free, then its 
length spectrum is not arithmetic, i.e., the lengths of its closed geodesics generate a dense 
subgroup of M. This result is already known, see [Dal99, Page 205]. It is proved in this article 
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using crossratios. This argument based on crossratios can be used to prove Proposition 3.2 
in full generality. However, we will give a different, more direct, proof. 

We will use the following topological lemma. 

Lemma 3.3. Let g be a hyperbolic element in a hyperbolic group A with connected boundary. 
There exists an arc I (i.e., a subset of dA homeomorphic to [0, \]) joining g~ and g + , 
invariant under an iterate g l of g. 

Proof. We will use nontrivial results on the topology of dA. When it is connected, then 
it is also locally connected by [Swa96]. Hence, it is also path connected and locally path 
connected, see [HY61, Theorem 3-16]. Moreover, for any £ E 3A , the space dA \ {£} has 
finitely many ends by [Bow98b], 

Consider g as in the statement of the lemma. Its action permutes the ends of dA \ {g ~}. 
Taking an iterate of g, we can assume it stabilizes the ends. If £ is close to g~, it is also the 
case of g£. As they belong to the same end, one can join them by a small arc J that avoids 
g~ (and g + ). Then UnezS'"'^ joins g~ to g + , and it is invariant under g. However, it is not 
necessarily an arc if g l J intersects J in a nontrivial way for i ^ 0. To get a real arc, we will 
shorten J as follows. 

As g n J converges to g± when n tends to ±oo, the arc J can only intersect finitely many 
g l J. Let us fix a parametrization u : [0,1] —>• J. The quantity 

inf{|t — s\ : s, t E [0,1] and 3i / 0, u(t) = </u(s)} 

is realized by compactness (since i remains bounded), for some parameters s,t,i. Replacing 
s,t, i with t,s,—i if necessary, we may assume i > 0. As g~ and g + are the only fixed 
points of g l , we have s ^ t. Let K = u([.s,t]), this is an arc between g = u(s ) and 
g l g = u(t). Moreover, g 3 K does not intersect K , except maybe at its endpoints for j = Li: 
otherwise, there exists x in the interior of K such that g 3 x also belongs to K , contradicting 
the minimality of |s — t\. 

It follows that UncZ fJ ra K is an arc from g~ to g + , invariant under g l . □ 

Proof of Proposition 3.2. Let us consider the cocycle c = c mod Z. The assumption of the 
proposition ensures that c(g,g + ) = 0 for all hyperbolic elements g. In geometric terms, this 
would correspond to an assumption that the cocycle has vanishing average on all closed 
orbits. Hence, we may apply a version of Livsic’s theorem, due in this context to [INO08] 
(Theorem 5.1). It ensures that the cocycle c is a coboundary: there exists a Holder contin¬ 
uous function b : dT —> M/Z such that, for all £ E 9T, for all g E T, 

(3-1) c(g ,£) = %£)- 6(£). 

Recall that, since the group T is not virtually free, its boundary is not totally discon¬ 
tinuous (see [KB02, Theorem 8.1]). The stabilizer of a nontrivial component L of <9T is a 
subgroup A of T, quasi-convex hence hyperbolic, whose boundary is L (see the discussion 
on top of Page 55 in [Bow98a]). 

Let us consider an infinite order element g E A. Lemma 3.3 constructs an arc I from g~ 
to g + in dA C <9T, invariant under an iterate g l of g. Replacing g with g l , we may assume 
i = 1 . 

The restriction of the function b to the arc I admits a continuous lift b : I —>• R, as 
I is simply connected. The function F : £ eA c(g, £) — b(gtf) + 6 (£) is well defined on 
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/, continuous, and it vanishes modulo Z by (3.1). Hence, it is constant. In particular, 
c(g,g-) = F(g-) = F(g+) = c(g,g+). □ 

In order to apply Proposition 3.2, we will need the following result on stable translation 
lengths in hyperbolic groups ([BH99, Theorem III.T.3.17]). 

Lemma 3.4. Let (r, d) be a hyperbolic group with a word distance. Then there exists an 
integer N such that, for any g G T, one has Nl(g) € Z. 

The combination of Lemma 3.1 and Lemma 3.4 shows that the cocycle d = Ncm/v satis¬ 
fies d{g,g+) € Z for any hyperbolic element g. Moreover, this cocycle is Holder-continuous 
since the Martin cocycle cm is itself Holder-continuous. This follows from [INO08] if p has 
finite support, and from [Goul3] if it has a superexponential moment. Now, Proposition 3.2 
implies the existence of a hyperbolic element g such that CM{g,g + ) = cm{S:S~)- This is a 
contradiction since c(g,g + ) = vl(g) > 0 and c(g,g~) = —c(g~ 1 ,g~) = —vl(g) < 0 again by 
Lemma 3.1. This concludes the proof of Theorem 1.5 when T+ = T. 

If r+ is a finite index subgroup of T, the same proof almost works in T+ to conclude that 
r+ is virtually free if h = tv, implying that T is also virtually free. The only difficulty is 
that the distance we are considering on r+ is not a word distance for a system of generators 
of r+. However, the only properties of the distance we have really used are: 

(1) It is hyperbolic and quasi-isometric to a word distance (to apply Theorem 1.2). 

(2) The stable translation lengths are rational numbers with bounded denominators. 

These two properties are clearly satisfied for the restriction of the distance d to T+. Hence, 
the above proof also works in this case. This completes the proof of Theorem 1.5. □ 

Remark 3.5. If A is a quasi-convex subgroup of a hyperbolic group T, then the restriction 
to A of a word distance on T also satisfies the above two properties. Hence, Theorem 1.5 
also holds in A for such a distance. 


4. Growth of non-distorted points in subgroups 

Our goal in this section is to prove Theorem 1.6 on the entropy of a random walk on 
an infinite index subgroup A of a hyperbolic group T. Since the geometry of such random 
walks is complicated to describe in general, our argument is indirect: we will show that, in 
any infinite index subgroup, the number of points that the random walk effectively visits 
is exponentially small compared to the growth of T. This is trivial if the growth v\ = 
liminfjj-^oo lo gl^nA| j s strictly smaller than v = vr- When v\ = v, on the other hand, we 
will argue that the random walk does not typically visit all of A, but only a subset made 
of non-distorted points. To prove Theorem 1.6, the main step is to show that, even when 
v\ = v, the number of such non-distorted points is exponentially smaller than e nv . We 
introduce the notion of non-distorted points in Paragraph 4.1, prove this main geometric 
estimate in Paragraph 4.2, and apply this to random walks in Paragraph 4.4. Paragraph 4.3 
is devoted to the case v\ < v, where unexpected phenomena happen even in distorted 
subgroups. 



ENTROPY AND DRIFT IN WORD HYPERBOLIC GROUPS 


22 


4.1. Non-distorted points. There are at least two different ways to define a notion of 
non-distorted point. 

Definition 4.1. Let T be a finitely generated group endowed with a word distance d = dp, 
and let A be a subgroup ofT. 

• For e > 0 and M > 0, we say that g G A is (e, M)-quasi-convex if any geodesic 7 
from e to g spends at least a proportion e of its time in the M-neighborhood of A, 
i.e., 

|{* e [1, | 5 |] : d('y(i),A) ^ M}\ e\g\. 

We write A qc( £ ,m) f or the set of points in A which are (s, M)-quasi-convex. 

• Assume additionally that A is finitely generated, and endowed with a word distance 
g?a• For D > 0, we say that g G A is D-undistorted if d\(e, g) ^ Ddp(e,g). We 
write Ajj-f-jdyj for the set of D-undistorted points. 

Up to a change in the constants, these notions do not depend on the choice of the distance 
d. The first definition has the advantage to work for infinitely generated subgroups, but it 
may seem less natural than the second one. If A is a quasi-convex subgroup of a hyperbolic 
group T, then all its points are (1, Af)-quasi-convex if M is large enough, and all its points 
are also D-undistorted for large enough D. In the general case, a quasi-convex point does 
not have to be undistorted: it may happen that the times i such that d('y(i),A) ^ M are all 
included in [ 1 , |g|/ 2 ], while between \g\/2 and \g\ one needs to make a huge detour to follow 
A, making (1 a( e, g) much larger than dp{e,g). On the other hand, an undistorted point is 
automatically quasi-convex, at least in hyperbolic groups: 

Proposition 4.2. Let T be a hyperbolic group, let A be a finitely generated subgroup ofT, 
and let D > 0. There exist £ > 0 and M > 0 such that any D-undistorted point is also 
(£,M)- quasi-convex, i.e., A UD(D) C A qc( £ ,m)- 

Proof. Consider g G A which is not (e, M)-quasi-convex, we have to show that d\(e,g) is 
much bigger than n = dp(e,g). The intuition is that, away from a T-geodesic from e to g , 
the progress towards g is much slower by hyperbolicity. 

Let us consider a geodesic from e to g in A, with length d\(e, g). Replacing each generator 
of A by the product of a uniformly bounded number of generators of T, we obtain a path 
7 A in the Cayley graph of T, remaining in the Co-neighborhood of A (for some Co > 0) and 
with length |y A | ^ C 0 d\(e,g). 

Let us consider a geodesic 7 r from e to g for the distance dp. For each x G T, we can 
consider its projection ir(x) on yp, i.e., the point on yp that is closest to x (if several points 
correspond, we take the closest one to e). This projection is 1-Lipschitz. In particular, the 
projection of 7 a covers the whole geodesic 7 p. For each a 7 € 7 r, let us consider the first 
point yi G 7a projecting to 27. 

Let us fix an integer L, large enough with respect to the hyperbolicity constant of T. 
Along 7 p, let us consider the points at distance kL from e, i.e., xq = e, xl, X 2 L, ■ ■ ■, x m L 
with m = [n/L\. In particular, |7a| ^ Xa dp{UiL, 2/(i+i)l)- Moreover, a tree approximation 
shows that dp{y iL ,y^ + p )L ) dp(y iL ,x iL ) + L + dp{x^ i+l)Ll y {i+ p jL ) - C x (where C\ only 
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depends on the hyperbolicity constant of T). Choosing L ^ C i, we get 

m m 

ItaI ^ J2 dr(x iL ,y iL ) ^ A) — Co). 

2=0 2=0 

Since we assume that g is not (e, M)-quasi-convex, the set of indices i with d(xi, A) ^ M 
has cardinality at most en. Taking M ^ Co, the previous equation is bounded from below 
by 

(m + 1 — en)M — (m + l)Co ^ (n/L — en)M — nCo/L. 

Finally, we get 

d\{e, g) ^ ItaI/C'o ^ n(l/L - e)M/C 0 - n/L. 

If e is small enough and M is large enough so that (1/L — e)M/Cq — 1/L > D, we obtain 
d\(e,g) > Dn , i.e., g ^ A ud(D), as desired. □ 

From this point on, we will mainly work with the notion of quasi-convex points, since 
counting results on such points imply results on undistorted points by the previous propo¬ 
sition. 

4.2. Non-distorted points in subgroups with v\ = v. In this section, we show that 
there are exponentially few quasi-convex points in infinite-index subgroups of hyperbolic 
groups. 

Theorem 4.3. Let T be a nonelementary hyperbolic group endowed with a word distance. 
Let A be an infinite index subgroup ofT. Then 

(4.1) |£ n n A| = o(\B n \). 

Moreover, for all e > 0 and M > 0, there exists g > 0 such that, for all large enough n, 

(4-2) \B n n h-Qc(e,M)\ ^ e vn \B n \. 

One may wonder why we put the estimate (4.1) in the statement of the theorem, while 
the main emphasis is on counting quasi-convex points. It turns out that this estimate 
is not trivial, and that its proof uses the same techniques as for the proof of (4.2). To 
illustrate that it is not trivial, let us remark that this estimate is not true without the 
hyperbolicity assumption. For instance, in T = F 2 X Z (with its canonical generating 
system, and the corresponding word distance), the infinite index subgroup A = F 2 satisfies 
\AnB n \/\B n \ ft c > 0. 

Theorem 4.3 is trivial if the growth rate v\ of A is strictly smaller than the growth rate 
v of r, since in this case \B n n A| itself is exponentially smaller than \B n \. However, this is 
not always the case, even for finitely generated subgroups. 

Consider for instance a compact hyperbolic 3-manifold which fibers over the circle, ob¬ 
tained as a suspension of a hyperbolic surface with a pseudo-Anosov. Its fundamental group 
r surjects into Z = 7Ti(§ 1 ). The kernel A of this morphism 99 is the fundamental group of 
the fiber. It is finitely generated, with infinite index, and \B n C\ A| ~ c\B n \/ y/n, see [Sha98]. 

Heuristically, one can understand in this case why there are exponentially few quasi- 
convex points in A. Let us consider a geodesic of length n in F. It projects under to a 
path in Z, which behaves roughly like a random walk. In particular, (~l A| behaves 

like the probability that a random walk on Z comes back to the identity at time n. This 
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is of order 1 /y/n, in accordance with the rigorous results of [Sha98]. Such an element is 
quasi-convex if the random walk in Z spends a big proportion of its time close to the origin. 
A large deviation estimate shows that this is exponentially unlikely. 

The proof of the theorem consists in making this heuristic precise, in the general case 
where the subgroup A is not normal (so that there is no morphism ip at hand). An important 
point in the proof is that a hyperbolic group is automatic, i.e., there exists a finite state 
automaton that recognizes a system of geodesics parameterizing bijectively the points in 
the group. Counting points in the group then amounts to a random walk on the graph of 
this automaton, while counting points in A amounts to a fibred random walk, on this graph 
times A\T. As this space is infinite, the random walk spends most of its time outside of 
finite sets, i.e., far away from A. 

To formalize this argument, we will reduce the question to Markov chains on graphs, 
where we will use the following probabilistic lemma. 

Lemma 4.4. Consider a Markov chain (X n ) on a countable set V, with a stationary mea¬ 
sure m (i.e., m(x) = ^m(i/)p(||,x) for all x). Let V be the set of points x € V such that 
Ylx->y m (y) = T°o, where we write x —>• y if there exists a positive probability path from x 
to y. Then, for all x € V and x' € V, 

(4.3) ¥ x (X n = x') —>• 0 when n —>• oo. 

Take x € V and e > 0. There exists rj > 0 such that, for all large enough n, 

(4.4) P x (X n = x and X., visits x at least en times in between ) ^ e~ vn . 

Proof. In countable state Markov chains, a point x can be either transient, or null recurrent, 
or positive recurrent. Let us first show that points in V are not positive recurrent, by 
contradiction. Otherwise, the points that can be reached from x form an irreducible class C, 
which admits a stationary probability measure p. The restriction of m to C is an excessive 
measure. By uniqueness (see [Rev84, Theorem 3.1.9]), the measure m is proportional on C 
to p. In particular, it has finite mass there. This contradicts the assumption m(y) = 

Too. 

Let us now show that, for all x € V and x' € V, the probability P x (X n = x') tends to 0. 
Otherwise, conditioning on the first visit to x we deduce that P x >(X n = x') does not tend 
to 0. This implies that x 1 is positive recurrent, a contradiction. 

Let us now prove (4.4). Consider x € V, it is either transient or null recurrent. If it is 
transient, the probability p to come back to x is < 1. Hence, the probability to come back 
en times is bounded by p £n , and is therefore exponentially small as desired. 

Assume now that x is null recurrent: almost surely, the Markov chain comes back to 
x , but the waiting time r has infinite expectation. Let ti, 72,... be the length of the 
successive excursions based at x. They are independent and distributed like r, by the Markov 
property. The probability in (4.4) is bounded by P(]C*=i 7* ^ n), which is bounded for any 
M by P(Xa=i T^-nPM T n). The random variables t % i Ti pM are bounded, independent and 
identically distributed. If M is large enough, they have expectation > 1/e. A standard large 
deviation result then shows that P(Xa=i tItTM T n) is exponentially small, as desired. □ 

We will also need the following technical lemma, which was explained to us by B. Bekka. 
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Lemma 4.5. Let A. be a subgroup of a group T. Assume that there exists a finite subset B 
of r such that BAB = T. Then A has finite index in T. 

Proof. We have by assumption Y = (J. ; ■ biAbj = U* j Afipbj, where A, = biAbf 1 is a conju¬ 
gate of A (and has therefore the same index). A theorem of Neumann [Neu54] ensures that 
a group is never a finite union of right cosets of infinite index subgroups. Hence, one of the 
A i has finite index in T, and so has A. □ 

Let r be a hyperbolic group, with a finite generating set S. Consider a finite directed 
graph A = (V,E,x*) with vertex set V, edges E, a distinguished vertex x*. and a labeling 
a : E —>• S. We associate to any path 7 in the graph (i.e., a sequence of edges oy, 07, ..., a m - 1 
where the endpoint of 07 is the beginning of 07+1) a path in the Cayley graph starting from 
the identity and following the edges labeled a (07), then 0(07), and so on. The endpoint of 
this path is a* ( 7 ) := a(07) • • • a(<7 m _i). We always assume that any point can be reached 
by a path starting at x;*. 

A hyperbolic group is automatic (see, for instance, [Call3]): there exists such a graph 
with the following properties. 

(1) For any path 7 in the graph, the corresponding path 0(7) is geodesic in the Cayley 
graph. 

(2) The map a* induces a bijection between the set of paths in the graph starting from 
x * and the group T. 

In particular, the paths of length n in the graph originating from x* parameterize the 
sphere § n of radius n in the group. The existence of such a structure makes it for instance 
possible to prove that the growth series of a hyperbolic group is rational. We will use such 
an automaton to count the points in the subgroup A, and in particular the quasi-convex 
points. 

We define a transition matrix A, indexed by V. By definition, A xy is the number of edges 
from x to y. Hence, (A n ) xy is the number of paths of length n from x to y. In particular, 
the number of paths of length n starting from x* is ^2 y (A n ) Xlty . Write u for the line vector 
with 1 at position x* and 0 elsewhere, and u for the column vector with 1 everywhere. This 
number of paths reads uA n u. Therefore, |S n | = uA n u , proving the rationality of the growth 
function of the group. Let v be the growth rate of balls in T. It satisfies \B n \ ^ Ce nv , 
by [Coo93]. Hence, the spectral radius of A is e v , and A has no Jordan block for this 
maximal eigenvalue. 

To understand the points of the infinite index subgroup A of T, we consider an extension 
Aa of A, with fibers A\T. Its vertex set Va is made of the pairs (x, A g) £kx A\T. For any 
edge a in A, going from x to y and with label a(cr), we put for any g £ T an edge in A\ 
from (x, A g) to (y, Aga(a)). A path 7 in A, from x to y, lifts to a path 7 in A\ originating 
from (x, Ae). By construction, its endpoint is (y, Aa*(7)). This shows that the paths in the 
graph A a remember the current right coset of A. 

The next lemma proves that the relevant components of this fibred graph are infinite. 

Lemma 4.6. Let x 0 = (xo,Ayo) belong to Aa- Let C be the component of xq in A (i.e., the 
set of points that can be reached from xo and from which one can go back to xy )■ Let Ac be 
the restriction of the matrix A to the points in C. Assume that its spectral radius p{Ac ) is 
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equal to e v . Then, starting from xq in the graph C\ (the restriction of Aa to C x A\r ), one 
can reach infinitely many different points of C\. 

Proof. It suffices to show that one can reach infinitely many points whose component in C 
is xq. Assume by contradiction that one can only reach a finite number of classes (xo, A gf). 

Given w € T and C > 0 , let Y w c be the set of points in T that have a geodesic expression 
in which, for any subword w of this expression and for any a , b with length at most C, one 
has w A o,wb. In other words, the points in Y w c are those that never see w (nor even a 
thickening of w of size C) in their geodesic expressions. Theorem 3 in [AL 02 ] proves the 
existence of Cq such that, for any w, the quantity \B n nY Wj c 0 \/\B n \ tends to 0 (the important 
point is that Cq does not depend on w). 

The number of paths in C originating from xq grows at least like c\B n \ since the spectral 
radius of Aq is e v . These paths give rise to distinct points in T. Hence, there exists such a 
path 70 such that a*(7o) ^ Y Wt c 0 - In particular, there exists a subpath 71 such that a*(71) 
can be written as a\wb\ with \a\\ ^ Co and |&i| ^ Co- We can choose a path from xo to the 
starting point of 71, with fixed length (since C is finite), and another path from the endpoint 
of 71 to xo- Concatenating them, we get a path 72 from xo to itself with a*(72) = a2wb-2 
and 1 02 1 , I&2I ^ Ci = Co + 2 diam(C). By assumption, Ago<r*(72) is one of the finitely many 
A gi since we are returning to x'o- Hence, there exists A £ A such that goa2wb-2 = Ag*. This 
shows that w € BAB, where B is the ball of radius Ci +maxj d{e,gi). As this holds for any 
w, we have proved that BAB = T. By Lemma 4 . 5 , this shows that A has finite index in T, 
a contradiction. □ 


Lemma 4 . 7 . Let K{n, xq, £q) denote the set of paths in A a starting at a point xq, of length 
n, coming back to xq at time n, and spending a proportion at least £q of the time at xo- 
Consider xo € A a and £q > 0 . Then there exist g > 0 and C > 0 such that, for all n € N, 

\K(n,x 0 ,£ 0 )\^Ce n ^\ 


Pwof. Write xo = (xo,Ago), let C be the component of xq in A. If the spectral radius of 
the restricted transition matrix Aq is < e v , we simply bound \K(n, Xo, £o)| by the number 
of paths in C from xq to itself. This is at most ||A^||, which is exponentially smaller than 
e nv as desired. 

Assume now that p{Ac ) = e v . We will understand the number of paths in C (and in its lift 
Ca) in terms of a Markov chain. The matrix Aq has a unique eigenvector q corresponding 
to the eigenvalue e v , it is positive by Perron-Frobenius’s theorem. By definition, p{x,y) = 
e~ v A xy q(y) / q{x) satisfies, for any x € C, 


^2p(x,y) 

y eC 


q (x) -1- 


This means that p{x,y ) is a transition kernel on C. Denote by (Af n ) ne pj the corresponding 
Markov chain. By construction, 


P x (X n = y) = e nv (A n ) xy q(y)/q(x). 


Moreover, (A n ) xy is the number of paths of length n in A from x to y. Hence, up to a 
bounded multiplicative factor q(y)/q(x), the transition probabilities of the Markov chain X n 
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count the number of paths in the graph C. Let m denote the unique stationary probability 
for the Markov chain on C. 

We lift everything to C\, assigning to an edge the transition probability of its projection in 
C. The stationary measure m lifts to a stationary measure m\, which is simply the product 
of m and of the counting measure in the direction A\T. Denoting by X£ the Markov chain 
in C\, we have 

e ~ nv \K(n,xo,£o)\ = Pi 0 (X,^ = xo and X^ visits xo at least son times in between). 

By Lemma 4.6, the Markov chain starting from xo can reach infinitely many points. Equiva¬ 
lently, since m is bounded from below, it can reach a set of infinite m-A-measure. Therefore, 
Lemma 4.4 applies, and shows that the above quantity is exponentially small. □ 

Proof of Theorem f.3. Let us first prove (4.2). Counting the points in § n nA qc(s,M) amoun ts 
to counting the paths of length n in A\, starting from (x*, Ae) and spending a proportion 
at least £ of their time in the finite subset F = V x A Bm C V\. Such a path spends a 
proportion at least £q = e /\F\ of its time at a given point x € F. Let k and k + m denote 
the first and last visits to x (with m ^ e^n since there are at least Equ visits). Such a path 
is the concatenation of a path from (x*,Ae) to x of length k (their number is bounded by 
the corresponding number of paths in A. at most ||A fe || ^ Ce kv ), of a path in K(m,x,Eo), 
and of a path starting from x of length n — k — m (their number is again bounded by the 
number of corresponding paths in A. at most C e^ n ~ k ~ m ^ v ). Hence, their number is at most 
Ce^ n ~ m ^ v \K(rn, x, £o)|- Summing over the points x € F, over the at most n possible values 
of k, and the values of m, we get the inequality 

n 

|S n CA QC{ e,M)\^ C ne nv Yl E e- mv \K(rn,x,e 0 )|. 

x£F m=eon 

Lemma 4.7 shows that this is exponentially smaller than e nv . 

Let us now prove (4.1), using similar arguments. A point in §™ fl A corresponds to a 
path of length n in A\. starting from (x*, Ae) and ending at a point (x, Ae). We say that a 
component C in the graph A is maximal if the spectral radius of the corresponding restricted 
matrix Aq is e v . Since the matrix A has no Jordan block corresponding to the eigenvalue e v , 
a path in the graph encounters at most one maximal component. The paths in Ma whose 
projection in A spends a time k in non-maximal components give an overall contribution 
to |§ n n A| bounded by ( 7 e ( n - fc A+fc(r-»?) ^ Ce~ vk \B n \. Given £ > 0, their contribution for 
k ^ ko(s) is bounded by £|H n |. Hence, it suffices to control the paths for fixed k. Let us fix 
the beginning of such a path, from (x*,Ae) to a point (xo,Ago) where Xo is in a maximal 
component C, and its end from (xi, A<?i) with xi € C to a point (x, Ae). To conclude, one 
should show that the number of paths of length n from (xo, Ago) to (xi, Agi) is o(e nv ). This 
follows from the probabilistic interpretation in the proof of Lemma 4.7 and from (4.3). □ 

4.3. Non-distorted points in subgroups with v\ < v. Let A be a subgroup of a hy¬ 
perbolic group T. Let v\ and nr be their respective growths, for a word distance on T. If 
v\ = up, Theorem 4.3 proves that there is a dichotomy: 

(1) Either A is quasi-convex (equivalently, A has finite index in T). Then | B n n A| ^ 
ce nVA , and all points in A are quasi-convex. 
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(2) Or A is not quasi-convex (equivalently, it has infinite index in T). Then \B n n A| = 
o(e nVA ), and there are exponentially few quasi-convex points in A. 

Consider now a general subgroup A with v\ < vr- If it quasi-convex, then (1) above 
is still satisfied: \B n 0 A| ^ ce nVA by [Coo93], and all points in A are quasi-convex. One 
may ask if these properties are equivalent, and if they characterize quasi-convex subgroups. 
This question is reminiscent of a question of Sullivan in hyperbolic geometry: Are convex 
cocompact groups the only ones to have finite Patterson-Sullivan measure? Peigne showed 
in [Pei03] that the answer to this question is negative. His counterexamples adapt to our 
situation, giving also a negative answer to our question. 


Proposition 4.8. There exists a finitely generated subgroup A of a hyperbolic group T 
endowed with a word distance, which is not quasi-convex, but for which C~ 1 e nVA ^ |H n flA| ^ 
Ce nVA . Moreover, most points of A are quasi-convex: there exist e and g such that 

(4.5) \B n n A \ Aq( 7 (£j0) | ^ Ce n ( VA ~ 1, \ 

Proof. The example is the same as in [Pei03], but his geometric proofs are replaced by 
combinatorial arguments based on generating series. 

Let G be a finitely generated non-quasi-convex subgroup of a hyperbolic group G (take 
for instance for G the fundamental group of a hyperbolic 3-manifold which fibers over the 
circle, and for G the fundamental group of the fiber of this fibration). Let H = F&, with k 
large enough so that vu vq- We take A = G*HcT = G*H. It is not quasi-convex, 
because of the factor G. Writing va for its growth, we claim that, for some c > 0, 


(4.6) 


|§ n n A| ~ ce nVA . 


We compute with generating series. Let F G (z) be the growth series for G, given by Fq{z) = 
^ n>0 |§" n G\z n . Likewise, we define Fn and F\. Since any word in A has a canonical 
decomposition in terms of words in G and Ft, a classical computation (see [dlHOO, Prop. 
VI.A.4]) gives 


(4.7) 


Fa 


_ FqFh _ 

1 — (F g — 1 ){F h - 1) 


Let zq = c~ v ° ^ zh = e~ VH be the convergence radii of Fq and Fh- At zh, we have 
Fh{zh) = +oo, since the cardinality of spheres in the free group is exactly of the order of 
e nVH . When z increases to zh, the function (Fq(z) — 1 )(Fh(z) — 1) takes the value 1, at a 
number z = z\- Since this is the first singularity of Fa, we have z\ = e~ VA . Moreover, the 
function F\ is meromorphic at z\, with a pole of order 1 (since the function (Fq — 1)(Fh~ 1) 
has positive derivative, being a power series with nonnegative coefficients). It follows from 
a simple tauberian theorem (see, for instance, [FS09, Theorem IV. 10]) that the coefficients 
of F\ behave like czffi 1 , proving (4.6). 

Let us estimate the number of non-quasi-convex points in A. Consider a word w € A of 
length n, for instance starting with a factor in G and ending with a factor in H. It can be 
written as <?i/ ti 5 , 2^2 • • • h s . Along a geodesic from e to w, all the words g\h (with h prefix 
of hi) belong to A. So do all the words g\h\g 2 h with h prefix of I 12 , and so on. Therefore, 
the proportion of time that the geodesic spends outside of A is at most J2\gi\/n. Such a 
point in A \ A qcu,o) satisfies ^ (1 ~ e)n and ^ en. Assuming e ^ 1/2, this 
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gives ^)|/q| ^ (e/2) 1 11 particular, for any a > 0, we have e a ^ 9 i l~ 2£ 1 221^1) ^ l. Let 

u n = |S n nA\A QC(£)0) |, its generating series satisfies the following equation (where we only 
write in details the words starting with G and ending in H, the other ones being completely 
analogous): 

^2u n z n e a(E«i-2e -1 EM|§®i n G||§ ftl nH\--- 1§ 6 * n H\z n + ... 

ai+6i+a2H- \-be=n 

= E[Tc( e “ 2 ) - l)(F„(e- 2 °’-\) - 1)]' + ... 

F G (e°z)F H (e- 2a£ ~ 1 z ) 

1 — ( F G {e a z ) — l)(Ffj (e~ 2as ~ 1 z) — 1) 

This is the same formula as in (4.7), but the factor z has been shifted in Fq and Fh- Choose 
a > 0 such that e a z\ < zq, and then e small enough so that {FQ(e a z\) — l)(Fe-(e _2a£ z\)— 

1) < 1. We deduce that the series ^ ~fu n z n converges for z = z\, and even slightly to its 
right. It follows that u n is exponentially small compared to zfj 1 . This proves (4.5). □ 

4.4. Application to random walks in infinite index subgroups. In this paragraph, 
we use Theorem 4.3 to prove Theorem 1.6 on random walks given by a measure // on a 
hyperbolic group T, assuming that T^ has infinite index in T. 

Before proving Theorem 1.6, we give another easier result, pertaining to the case where 
fj, has a finite moment for a word distance on (which should be finitely generated): In 
this case, the random walk typically visits undistorted points. This easy statement is not 
used later on, but it gives a heuristic explanation to Theorem 1.6. 

Lemma 4.9. Let A be a finitely generated subgroup of a finitely generated group T. Let d\ 
and dr be the two corresponding word distances. Consider a probability measure // on A, 
with a moment of order 1 for d\ (and therefore for dr), with nonzero drift for dr- Let. X n 
denote the corresponding random walk. There exists D > 0 such that P(X n € A jjd(D)) 1- 

Proof. Almost surely, dr(e,X n ) ~ fpn, for some nonzero drift fp. In the same way, 
d\(e,X n ) ~ £\n. For any D > i^flr, we get almost surely d\(e,X n ) ^ Ddr(e,X n ) for 
large enough n, i.e., X n € A ud(D)- □ 

This lemma readily implies Theorem 1.6 under the additional assumption that A is finitely 
generated and that jj, has a moment of order 1 for d\. Indeed, for large n, with probability 
at least 1/2, the point X n belongs to B^ +£ ^ n n A ud{d)i whose cardinality is bounded by 
(j e U+e) n (v-'n) accor 4 i n g to Theorem 4.3. Lemma 2.4 yields h ^ (ft + e){v — 77 ), hence 
h ^ t{v — rj) < tv, completing the proof. 

However, the assumptions of Theorem 1.6 are much weaker: even when A is finitely 
generated, it is much more restrictive to require a moment of order 1 on A than on T, 
precisely because the T-distance is smaller than the A-distance on distorted points, which 
make up most of A. The general proof will not use undistorted points (which are not even 
defined when A is not finitely generated), but rather quasi-convex points: we will show 
that, typically, the random walk concentrates on quasi-convex points. With the previous 
argument, Theorem 1.6 readily follows from the next lemma. 
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Lemma 4.10. Let A. be a subgroup of a hyperbolic group T endowed with a word distance 
d = dr- Let us consider a probability measure g on A, with a moment of order 1 for dr- 
There exist e > 0 and M > 0 such that F{X n € A qc( 6 ,m)) ^ 1/2 for large enough n. 

Proof. The lemma is trivial if g is elementary, since all the elements of C A are then 
quasi-convex. We may therefore assume that /r is non-element ary. 

The random walk at time n is given by X n = g\-■ • g n , where gt are independent and 
distributed like g. We will show that most products g\ - ■ ■ gi (which belong to A) are within 
distance M of a geodesic from e to X n (this amounts to the classical fact that trajectories 
of the random walk follow geodesics in the group), and moreover that they approximate a 
proportion at least e of the points on this geodesic. This will give X n € A qcu,m) as desired. 
The second point is more delicate: we should for instance exclude the situation where, given 
a geodesic 7 , one has X n = 7 ( 0 ( 77 .)) where a{n ) is the smallest square larger than n. In this 
case, X n follows the geodesic 7 at linear speed, but nevertheless the proportion of 7 it visits 
tends to 0. This behavior will be excluded thanks to the fact that, with high probability, 
the jumps of the random walk are bounded. 

The argument is probabilistic and formulated in terms of the bilateral version of the 
random walk. On 17 = T z with the product measure P = / 7 ® z , let g n be the ? 7 -th coordinate. 
The g n are independent, identically distributed, and correspond to the increments of a 
random walk (X n ) n& % with Xq = e and X ~ 1 X n+ 1 = g n+ \. Almost surely, X n converges 
when n —>• ±00 towards two random variables € <9T, with almost surely since 

these random variables are independent and atomless. Following Kaimanovich [KaiOO], 
denote by <S'(£“,£ + ) the union of all the geodesics from to £ + . Let 7 r be the projection 
on i.e., ir(g) is the closest point to g on It is not uniquely defined, but 

two possible choices are within distance Co, for some Co only depending on T. 

Let us choose L > 0 large enough (how large will only depend on the hyperbolicity 
constant of the space). Any measurable function is bounded on sets with arbitrarily large 
measure. Hence, there exists K > 0 such that, with probability at least 9/10, 

(1) For every \k\ ^ K, the projections n(Xk) are distant from 7 r(Xo) by at least L (and 
they are closer to if k > 0 , and to if k< 0 ). 

(2) We have d(e, S(£~ ,£+)) ^ K. 

As everything is equivariant, we deduce that, for all i £ Z, the point X j satisfies the same 
properties with probability at least 9/10, i.e., 

(4.8) d(Xi,S(C,t)) < K and, for all \k\ > K, ^vrpQ), 7r(X i+k )) L. 

Let n be a large integer. Write m = \n/K \. Among the integers K, 2 K ,... , mK ^ n, we 
consider the set I n ( uj) of those i such that Xi satisfies (4.8). We have E(|/ n |) ^ m ■ 9/10. 
As \I n \ ^ ? 77 , we get 

Qrri rn 777 Q777 

— < E(|/ n |) ^ -P(|4| < 777 /IO) + mF(\I n \ > m/10) = - + — P(|/ n | ^ m/10). 

This gives P(|/ n | ^ m/10) ^ 8/9. Let g = 1/(20 K). Let fl n be the set of co such that 
|/ n (u;)| ^ 7777 + 1 , and Xq and X n satisfy (4.8), and d(X n , e) ^ 2 in (where l is the drift of 
g). It satisfies P(fl n ) ^ 1/2 if n is large enough. This is the set of good trajectories for 
which we can control the position of many of the X % . 
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^0 


X n 



Figure 1. The projections on 7 and S 


Let oj € Q n . We write 1/ for a projection of X,; on a geodesic 7 from e to X n . Let 
I n = In \ {mK}, so that the elements of I n are at distance at least K of 0 and n. As Xq 
and X n satisfy (4.8), the projections n (Xj) for i & I n are located between 7r(Xo) and 7 r(X n ), 
and are at a distance at least L of these points (see Figure 1). If L is large enough, we 
obtain d(ir(Xi), Y)) ^ C\ by hyperbolicity, where C\ only depends on T. This gives 

d(Yi, A) < d(Yi, 7T(Xi)) + rf( 7 r(Xi), X ?: ) C x + K, 

thanks to (4.8) for Xj. When i 7 ^ j belong to I n , we have d(n(Xi), n(Xj)) ^ L again thanks 
to (4.8), hence d(Y),Y)) ^ L — 2C\. If L was chosen larger than 2C*i + 1, this shows that 
Yj ^ Yj. We have found along 7 at least \I n \ — 1 distinct points, within distance C\ + K of 
A. Moreover, for large enough n, 

\I n \ - 1 > rjn > 2 in ■ (?y/2£) ^ d(e,X n ) • (77/ 21). 

Let £ = 77 / 2 !' and M = C\ + K. We have shown that, for u € (whose probability is at 
least 1/2), the point X n (ui) belongs to A qc(s,m)- D 

5. Construction of maximizing measures 

In this section, we prove Theorem 1.7: Given any finite subset £ in a hyperbolic group T, 
there exists a measure fi-% maximizing the quantity h{fj)/l{fj) over all measures fi supported 
on £ with £( 77 ) > 0. To prove this result, we start with a sequence of measures supported 
on £ such that h(ni )/converges to the maximum M of these quantities. We are looking 
for 772 with /i(/7s)/^(aie) = M. Replacing \i % with (jj n + S e )/2 (this multiplies entropy and 
drift by 1/2, and does not change their ratio) and adding e to £, we can always assume 
[ii(e) ^ 1/2, to avoid periodicity problems. 

Extracting a subsequence, we can ensure that /j,; converges to a limit probability measure 
77. We treat separately the two following cases: 

( 1 ) L /t is non-element ary. 

( 2 ) L /t is elementary. 

Let us handle first the easy case, where T^ is non-elementary. In this case, the entropy 
and the drift are continuous at 77 , by Proposition 2.3 and Theorem 2.9, both due to Erschler 
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and Kaimanovich in [EK13]. Therefore, h(ni) / l(m) tend to h(n)/£(fj,), since in this case 
l{n) > 0. One can thus take ht. = //. 

The case where T^ is elementary is much more interesting. Let us describe heuristically 
what should happen, in a simple case. We assume that Hi = (1 — z)h + ev where v is a 
fixed measure, and e tends to 0. The random walk for /.q can be described as follows. At 
each jump, one picks // (with probability 1 — e) or u (with probability e), then one jumps 
according to the chosen measure. After time N, the measure v is chosen roughly zN times, 
with intervals of length 1/e in between, where /i is chosen. Thus, ii* N behaves roughly like 
(h* 1/e * u) eN . 

When T /t is finite, the measure h* 1 ^ is close, when e is small, to the uniform measure ir on 
T m . Therefore, fi* N is close to (n * u) eN . We deduce h(ni) ~ zh(ir * u) and £(/ii) ~ £1(tt * v). 
In particular, h(Hi) / £(Hi) h(-K * v)/£{n * u). One can take ht. = n * v. 

When T^ is infinite, it is virtually cyclic. Assuming that n is centered for simplicity, 
the walk given by arrives essentially at distance 1 / y/z of the origin, by the central 

limit theorem. Then, one jumps according to u, in a direction transverse to T^, preventing 
further cancellations. Hence, the walk given by (h* 1 ^ * u) sN is at distance roughly eN/yfz 
from the origin, yielding £(Hi) ~ yfz. On the other hand, each step only visits 1/e 

points, hence the measure (/T 1 / 6 * u) £N is supported by roughly (l/z) eN points, yielding 
h{ni ) ~ e|loge|. In particular, h(/ii) = o{£{^i)). This implies that h(fii)/£(ni), which tends 
to 0, can not tend to the maximum M. Therefore, this case can not happen. 

The rigorous argument is considerably more delicate. One difficulty is that fii does not 
decompose in general as (1 — z){i + zis: there can be in /j, points with a very small probability 
(which are not seen by /i), but much larger than z, the probability to visit a nonelementary 
subset of T. These points will play an important role on the relevant time scale, i.e., 1/e. 
Hence, we have to describe the different time scales that happen in m. 

For each a £ E, we have a weight Hi(a), which tends to 0 if a is not in the support of //. 
Reordering the a\~ and extracting a subsequence, we can assume that E = {ai,..., a p } with 
Hi(a i) ^ ^ ^i(cip) (and ai = e). Extracting a further subsequence, we may also assume 
that Hi(cik)/ 1 ) converges for all k, towards a limit in [0,1]. 

Let Tfc be the subgroup generated by ai,..., a^. We consider the smallest r such that 
r r is non-elementary. Then, we consider the biggest s < r such that /Uj(r) = o(fXi(s)). 
Roughly speaking, the random walk has enough time to spread on the elementary subgroup 
T s , before seeing a r . It turns out that the asymptotic behavior will depend on the nature 
of (finite or virtually cyclic infinite). 

We will decompose the measure m as the sum of two components (1 — £*)«* where 

£j tends to 0, the measure ctj mainly lives on T s , and the measure /% corresponds to the 
remaining part of /j,;, on {a s _|_i,... ,a p }. The precise construction depends on the nature of 

r s : 

• If T s is finite. Let be the normalized restriction of /j,; to {a s+ i,..., a p }. To 
avoid periodicity problems, we rather consider /3j = (S e + /3-°^)/2. We decompose 
Hi = (1 — £i)cti + £ifii, where a* is supported on a\,... ,a s . By construction, the 
probability of any element in the support of a* is much bigger than £{. 
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• If r s is virtually cyclic infinite. The group T s contains a hyperbolic element go, with 
repelling and attracting points at infinity denoted by g g" and g//. The elements of 
T s all fix the set {g^ , g// }. We take for a* the normalized restriction of m to those 
elements in S that fix { g ^ ,g//}, and for fii the normalized restriction of pi to the 
other elements. Once again, we can write p t = (1 — ef)ai + £ifii. 

In both cases, £j is comparable to the probability pi(a r ), and is therefore negligible with 
respect to Pi(a s ). We will write pi = p £ (and, in the same way, we will replace all indices i 
with e, since the main parameter is e = £,). The measure p £ converges to /i when e tends to 
0, while f3 £ tends to a probability measure (3, supported on e, a s+ i,..., a p . If the measures 
p £ are symmetric to begin with, the measures a £ and fi £ are also symmetric by construction. 

To generate the random walk given by p £ , one can first independently choose random 
measures p n : one takes p n = a £ with probability 1 — e, and p n = (3 e with probability e. 
Then, one chooses elements g n randomly according to p n , and one multiplies them: the 
product g\- ■ ■ g n is distributed like the random walk given by p £ at time ra. 

We will group together successive g into blocks where the equidistribution on T s can 
be seen. More precisely, denote by t\,t 2 , ■ ■ ■ the successive times where p n = (3 e (and 
to = 0). They are stopping times, the successive differences are independent and identically 
distributed, with a geometric distribution of parameter e (i.e., P(ti = ra) = (1 — s) n ~ 1 e), 
with mean 1/e. Write = gt N _ 1 +i • • • gt N - By construction, the L; are independent, 
identically distributed, and the random walk they define, i.e., L\- ■ ■ Ln, is a subsequence 
of the original random walk g± - ■ ■ g n . Let A e be the distribution of Lj on T, i.e., 

OO 

\ £ = J2(l-£) n ™*e n *Pe. 

n=0 

Lemma 5.1. The measure X £ has finite first moment and finite time one entropy. Moreover, 
I(g £ ) = ei{X £ ) and hfp £ ) = eh(X £ ). 

Proof. As the mean of t± is 1/e, the random walk generated by A e is essentially the random 
walk generated by p £ , but on a time scale 1/e. This justifies heuristically the statement. 

For the rigorous proof, let us first check that A £ has finite first moment (and hence finite 
time one entropy). Since all the measures have finite support, we have \L\\ ^ Ct\. Since a 
geometric distribution has moments of all order, the same is true for |Li|. 

The strong law of large numbers ensures that, almost surely, t]\r ~ N/e. Therefore, almost 
surely, 

I(X £ ) = lim |Li ''' La?1 = lim |gl "'/ tjvl = lim • % = i{p £ ) • 1/e. 

v ' N N t N N ’ ' 

This proves the statement of the lemma for the drift. 

For the entropy, we use the characterization of Lemma 2.4. We will show that h(p, £ ) ^ 
eh(X £ ) and h(p £ ) ^ eh(X £ ). Let K n be a set of cardinality at most which contains 

gi ■ ■ ■ g n with probability at least 1/2. Let N = era. With large probability, t^ is close to 
ra, up to r/n (where r/ is arbitrarily small). Hence, with probability at least 1/3, the point 
L\ • • • Ltv belongs to the Ci/ra-neighborhood of K n , whose cardinality is at most 
\K n \ ■ e C ' V ' n ^ e WUe)+V+C’ri')n _ e (h(fj, e )+ri+C’ri')N/£' 
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As rj and rf are arbitrary, this shows that h(X £ ) ^ h(p £ )/e. The converse inequality is 
proved in the same way. □ 

The previous lemma shows that we should understand A £ . We define an auxiliary proba¬ 
bility measure a £ so that \ £ = a £ * fi £ , by 

OO 

(5.1) d £ = ^(l-e) n ea* n . 

n=0 

In this formula, most weight is concentrated around those n of the order of 1 /e. Hence, we 
have to understand the iterates of a £ in time 1/e. When T s is finite, we will see that it 
has enough time to equidistribute on T s (even though a £ may give a very small weight to 
some elements, this weight is by construction much larger than e, so that 1/e iterates are 
enough to equidistribute). When T s is virtually cyclic, we will see that the random walk 
has enough time to drift away significantly from the identity. 

In both cases, we will need quantitative results on basic groups, but in weakly elliptic cases 
(i.e., the transition probabilities are not bounded from below). There are techniques to get 
quantitative estimates in such settings, especially comparison techniques (due for instance 
to Varopoulos, Diaconis, Saloff-Coste): one can compare weakly elliptic walks to elliptic 
ones (which we understand well) thanks to Dirichlet forms arguments: these arguments 
make it possible to transfer results from the latter to the former (modulo some loss in the 
constants, due to the lack of ellipticity). We will rely on such results when T s is infinite. 
When it is finite, such techniques can also be used, but we will rather give a more elementary 
argument. 

We start with the case where T s is finite. We need to quantify the speed of convergence 
to the stationary measure in finite groups, with the following lemma. 

Lemma 5.2. Let A. be a finite group. Let Ha C A be a generating subset (it does not have 
to be symmetric). Let 7ta be the uniform measure on A, and let d(p, 7ta) be the euclidean 

distance between a measure p and 7ta (i.e., (XXMf?) — ttaG?)) 2 ) 1 ^- For any 5 > 0, there 
exists K > 0 with the following property. Let g > 0. Consider a probability measure p on A 
with p(a) r] for any a G Ha U {e}. Then, for all n > K/rj, 

d(p* n , 7T A ) ^ S. 

In other words, the time to see the equidistribution towards the stationary measure is 
bounded by 1/rj, where r/ is the minimum of the transition probabilities on Ha- 

Proof. Endow the space A! (A) of signed measures on A with the scalar product correspond¬ 
ing to the quadratic form \v\ z = v(g) 2 . Denote by H = {u : Y) = 0} the hyperplane 

7of zero mass measures. For any probability p, denote by M p the left-convolution operator 
on A4(A), that is M p (y ) = p*v. Since convolution preserves mass, H is Mp-invariant. Let us 
prove that the operator norm of M p is bounded by 1. Indeed, put u p (g) = Ylhe A PWp(hg), 
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this is a probability on A. We have 

I m pH 2 = ( M pv(g)) 2 = p{aK l )p(9h2 1 ) l, ( h i) u (h2) 

gSA (g,fei,/i2)£A 3 

= ^ v{hi)v(h 2 )u p (hih 2 1 ) = ^2 v(h)v(g- l h)u p (g) 

(hi,h 2 )eA 2 ( g,h)eA 2 

< ^2w\ 2 u p (g) = |u| 2 . 

96A 

This proves that \\M p \\ ^ 1. Now fix p Q to be the uniform probability on the set Xa U {e}. 
Notice that u Po (g) > 0 for any g € Xa U {e}, since p 0 {fi) > 0. We claim that M Po restricted 
to H has an operator norm c < 1. Would it be not the case, there would exist v € H — {0} 
such that the previous inequalities would be equalities. Thanks to the equality case in the 
Cauchy-Schwarz inequality, this implies that, for any g € Xa, the two measures h eA v(h) 
and h H > ^{g^h) are positively proportional. Since their norm are equal, they must be 
equal. Since Xa generates A, v is A-invariant and belongs to H, so it must be zero. 

By assumption, the probability p, can be decomposed as 

P = VPo + (1 - ri)v, 

where v is some probability. This implies that M /t restricted to H has operator norm at 
most gc T (1 — 77 ). Therefore, 

d{p* n , vta) = \p* n ~ ttaI = \M£(6 e - tta)| < 2(1 - (1 - c)g) n . 

This inequality implies the result. □ 

We can now describe the asymptotic behavior of g £ when the group T s is finite. 

Lemma 5.3. Assume that T s is finite. Define a new probability measure A = 7Tr s * (3 (it 
generates a non-elementary subgroup). When e tends to 0, we have h(g e ) ~ eh{ A) and 
% e )~e*(A). 

Proof. The random variable t\, being geometric of parameter e, is of the order of 1/e with 
high probability (i.e., for any 5 > 0, there exists u > 0 such that P(ti > u/e) 1 — 5). 
Writing T. s = {ai,... ,a s } for the support of a £ , we have min (TG s s ot £ (o) = (1 — e ) _1 p £ (a s ), 
which is much bigger than e by definition of s. Lemma 5.2 shows that the measures a* n 
are close to 7ir s for n u/e. This implies that a E (defined in (5.1)) converges to 7Tp s when 
e —> 0. As converges to this shows that A e converges to A. 

The support of the measure A contains T s and a a +i,..., a r (as the support of fi contains 
{e,a s+ i,... ,a r } by construction). Hence, T^ contains the non-elementary subgroup T r . It 
follows that the entropy and the drift are continuous at A, by Proposition 2.3 and Theo¬ 
rem 2.9. We get h( A e ) —>• h( A) and £(X e ) —>• ^(A). With Lemma 5.1, this completes the 
proof. □ 

We deduce from the lemma that h(p £ )/£(g e ) tends to h(X)/£(X). Hence, the measure 
Py, = A satisfies the conclusion of the theorem, at least in the non-symmetric case. In the 
symmetric case, where we are looking for a symmetric measure /iy;, the measure A = 7 Tr s * /3 
is not an answer to the problem. However, A' = 7Tr s * fi * 7Tr s is symmetric, and it clearly 
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has the same entropy and drift as A (since 7Tr s * vrp s = 7Tr s ). Hence, we can take /ip = Ah 
This completes the proof of Theorem 1.7 when the group T s is finite. 

Example 5.4. Let T = Z/2 * Z/4, with E = {a, 6, 6 -1 } (where a is the generator of Z/2 
and 6 the generator of Z/4), with the word distance coming from E. [MM07, Section 5.1] 
shows that the supremum over measures supported on E of h(p)/£(p) is the growth v of the 
group (note that T is virtually free), and that it is not realized by a measure supported on 
E. This shows that, in Theorem 1.7, the fact that may need a support larger than E is 
not an artefact of the proof. 

In this example, any symmetric measure on E is of the form p £ = (1 — e)5 a +s/3 where f3 is 
uniform on {6, &” 1 }. The above proof shows that, when e tends to 0, h(p £ ) / £(p £ ) converges 
to h(X)/£(X) where A = 7Tr s * (3 = ^(S e + S a ) * ^(<5*, + S b - 1 ) is the uniform measure on 
{b, b^ 1 , ab, ab~ 1 }. 

It remains to treat the case where T s is virtually cyclic infinite. Such a group surjects 
onto Z or Z xi Z/2 (the infinite dihedral group), with finite kernel. From the point of view of 
the random walk, most things happen in the quotient. Hence, it would suffice to understand 
these two groups (separating in the case of Z the centered and non-centered cases). We will 
rather give direct arguments which do not use this reduction and which avoid separating 
cases. Let f ^ s be the smallest index such that {ai,--- ,a*} generates an infinite group. 
Let r] = T}(s) = n e (at), this parameter governs the equidistribution speed on T s (or, at least, 
on Tf, which has finite index in T s since these two groups are virtually cyclic infinite). We 
will find the asymptotics of the entropy and the drift in terms of ry/e (which tends to infinity 
by definition of s). We start with the entropy (for which an upper bound suffices). Note 
that the random walk directed by a £ does not live on L s , but on a possibly bigger group 
since we have put in a e all the points that fix the set {g$ , } (this will be important in 

the control of the drift below). Let T s be the group they generate, it is still virtually cyclic 
(see, for instance, [GdlH90, Theoreme 37 page 157]), and it contains T s as a finite index 
subgroup. 

Lemma 5.5. There exists a constant C such that h(\ £ ) ^ C\og(j]/e). 

Proof. Let K be the group generated by {ai,..., cp_i}. It is finite by definition of t. Let S' 
be the set of points among at, •.. ,a p which stabilize {</(/, g$ }. The group r s is generated 
by K and Eh Let us consider the associated word pseudo-distance d!, where we decide that 
elements in I\ have 0 length. This pseudo-distance is quasi-isometric to the usual distance, 
and it satisfies d'(e,xk) = d'(e,x) for all x € T s and all k € K. 

Let us first estimate the average distance to the origin for an element given by a £ . We 
decompose a £ as the average of a measure supported on {ai,..., at-i} C K, and of a 
measure supported on E' (the contribution of the latter has a mass m(e) bounded by (p — 
t + 1 )rj ^ Crf). The measure a* n can be obtained by picking at each step one of these two 
measures (according to their respective weight), and then jumping according to a random 
element for this measure. When we use the first measure, the d'-distance to the origin does 
not change by definition. Hence, the distance to the origin is bounded by the number of 
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choices of the second measure. We obtain 

oo n / \ 

V«Ad'(e,g)) < £(1 - e) n eJ2 ("W^l - m^))""* ■ Ci 

n =0 i —0 ' 

oo n 

= Cm(E)J2(l -e) n eJ2 n 

n= 0 i=l 

oo 

= Cm(e ) ^(1 — E) n £n = Cm(£)( 1 — e)/e ^ C?y/e. 

n =0 

A measure supported on the integers with first moment A has entropy bounded by 
C log A + C (see, for instance, [EK10, Lemma 2]). The proof also applies to virtually cyclic 
situations (the finite thickening does not change anything). Therefore, we get H(a £ ) ^ 
Clog(r]/£) +C. 

Finally, 

H{\ e ) = H(a £ * p e ) ^ H(a e ) + H{p e ) ^ C log ( V /e) + C, 
since the support of / 3 £ is uniformly bounded. As p/e —>• oo, this gives H( \ £ ) ^ CTog(r//e). 
Finally, we estimate h(X £ ) = inf n> o H(\* n )/n ^ H(X £ ) to get the conclusion of the lemma. 

□ 


n — 1 
*- 1 


m{e) 1 X (1 — vi(e)Y 


For the drift, we need to be more precise since we need a lower bound to conclude. We 
will use a lemma giving lower bounds on the equidistribution speed in virtually cyclic infinite 
groups, using comparison techniques. 

Lemma 5.6. Let A. be a virtually cyclic infinite group. Let Fa C A be a finite subset 
generating an infinite subgroup of A. There exists a constant C with the following property. 
Let 7] > 0. Let p. be a probability measure on A with p(e) ^1/2 and p(a) ^ r; for any 
a E Fa- Then, for all n ^ 1, 

swp p* n (g) < C^nY 1 ' 2 . 
g£ A 

The interest of the lemma is that C does not depend on the measure /i, and that we obtain 
an explicit control on p* n just in terms of a lower bound on the transition probabilities of 
T- 

Proof. We use the comparison method. Let p be the uniform measure on e, Fa and F^ 1 . 
The random walk it generates does not have to be transitive (since Fa does not necessarily 
generate the whole group A), but A is partitioned into finitely many classes where it is 
transitive (and isomorphic to the random walk on the group generated by Fa). Moreover, 
it is symmetric, and therefore reversible for the counting measure m on A. The Dirichlet 
form associated to p is by definition 

£p(f, f) = l ~ fly)\ 2 p(. x ~ l y)i 

x,y 

for any / : A —> C. As A has linear growth, the following Nash inequality holds (see, for 
instance, [WoeOO, Proposition 14.1]). 

\\f\\l^C\\ft L ,£ p {f,f), 



ENTROPY AND DRIFT IN WORD HYPERBOLIC GROUPS 


38 


where all norms are defined with respect to the measure m on A. Let P / be the Markov 
operator associated to //. It satisfies 

Ml* - will* = (fj) - = ((i-p;p,)fj). 

The operator P* P /t is the Markov operator associated to the symmetric probability measure 
is = fi* p, which satisfies is (a) ^ rj/2 for o € Ha U H^ 1 and is(e) ^ 1/4 (since //(e) ^ 1/2). 
Therefore, p(g) ^ Cg~ 1 is(g ) for all We deduce 

\\f\\h ~ WWh = /(- T )(/( x ) - f(y)) u ( x ~ l y) = \ Y )/( x ) - f(y)? v { x ~ l y) 

> Tjjj ^2\f( x ) - /(2/)l 2 p(^ _1 y) = ^ P (f, /)• 

Combining this inequality with Nash inequality, we obtain 

\\fW% < C'^ 7 _1 ll/lli 1 (ll/ll|2 - ||iVll£ a ). 

The operator P* satisfies the same inequality, for the same reason. Composing these in¬ 
equalities, we obtain an estimate for the norm of PJf from L 1 to L°° (this is [VSCC92, 
Lemma VII.2.6]), of the form 

SS (CVVn) 1/2 . 

Applying this inequality to the function S e , we get the desired result. □ 

The previous lemma implies that, if C is large enough, a neighborhood of size (r/ra) 1 / 2 /C' 
of the identity has probability for p* n at most 1/2. Hence, the average distance to the origin 
is at least of the order of ( r/n ) 1//2 . 

Now, we study the stationary measure for fd £ * a e on <9T. We recall that go is a hyperbolic 
element in T s , fixed once and for all. 

Lemma 5.7. There exists a neighborhood U of {g/f , g/f} in dT such that the stationary 
measure is e of /3 e * a £ satisfies is £ (U) —> 0. 

Proof. Let us first show that, for any neighborhood U of {gf, g ^}> then (a £ * 5 Z )(U C ) tends 
to 0, uniformly in z € 5T. This is not surprising since a typical element for d £ is large in the 
virtually cyclic group T s , and sends most points into U. To make this argument rigorous, we 
will use Lemma 5.6. The definition (5.1) shows that it suffices to prove that (a* n * S Z )(U C ) 
is small for n ^ u/e. 

The subgroup generated by go has finite index in T s . Hence, any element in T s can be 
written as gn'Yi, for 7 * in a finite set. Thus, the measure a* n can be written as Y c n (k, i)6 k , 

for some coefficients c n (k, i). Lemma 5.6 (applied to A = T s with Ha = {ai,...., at}) ensures 
that sup ki c n (k,i) ^ C /(r/n) 1 / 2 . When n ^ u/e, this quantity tends to 0 since £ = 0 ( 7 ). We 
have 

(«r * S Z )(U C ) =Y,Cn(k,i)l(g^ iZ i U ). 

k,i 

As the element go is hyperbolic, there exists C such that, for any w € dT, 

\{k £ Z : g k oW £ U}\ ^ C. 
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The uniformity in w follows from the compactness of (<9T \ {g Q , 5 , o"})/(<7o)- We obtain 
(a* n *5 Z )(U C ) ^ (supc n (fc,i)) {k <E Z : gfaz £ U}\ ^ Csupc n (k,i ) < C/{gn) 1/2 . 

k,i k,i 

i 

This shows that (a| n * 5 Z ){U C ) is small, as desired. 

As a £ * 5 Z (U C ) tends to 0 uniformly in z, we deduce that (a e * v £ )(U c ) also tends to 0, 
and therefore that (a £ * u £ ){U) tends to 1. 

Let A = {gfi ,g^ }. We claim that, for all g such that gA n A 0, then gA = A. Indeed, if 
g (%) £ A for instance, then g^gog is a hyperbolic element stabilizing gfi. It also stabilizes 
5 o“, by [GdlH90, Theoreme 30 page 154], i.e., gog(go ) = g(g$)- Hence, g(g^) is a fixed point 
of g 0 , i.e., g(g£) £ A. 

By definition of j3 e , the finitely many elements of its support do not fix A. They even 
satisfy gA n A = 0 for all g in this support, by the previous argument. If U is small enough, 
we get gU n U = 0 , i.e., g(U) C U c . 

Finally, 

v £ (U c ) = (fi £ * d £ * is £ )(U c ) ^ (d £ * u £ )(U), 

which tends to 1 when e tends to 0. □ 

Lemma 5.8. The drift l{\ £ ) satisfies £{X e ) ^ c • (g/e) 1 ^ 2 . 

Proof. Let p £ be a stationary measure for X £ , on the Busemann boundary d B V. By Propo¬ 
sition 2.2, 

1{X £ ) = j c B (g,f) dp £ (f ) dX £ (g), 

where cb(</,£) = is the Busemann cocycle. As X £ = a £ * fi £ , this gives 

£(Xe) = J c B (Lb,0dp £ (0dd £ (L)dfi £ (b). 

With the cocycle relation (2.2), this becomes 

£(X £ ) = j CB (L,bZ)dp e (S)da e (L)dp e (b) + J c B (b,0dp £ (0dd £ (L)dfi £ (b). 

The second integral is bounded independently of e since the support of fi £ is finite. In the 
first integral, = 6£ is distributed according to the measure p £ := (3 £ *p £ , which is stationary 
for fi £ *a £ . Lemma 5.7 implies that its projection (jt B )*p £ on the geometric boundary, which 
is again stationary for (3 £ * a £ , gives a small measure to a neighborhood U of {gfi ,g^ }. 

As the limit set of f s is {gfi, ^o"}, there exists a constant C such that, for all £ ^ 
and g £ T s , we have \h^{g~ l ) — d(e,g)\ ^ C. For £ € , we only use the trivial bound 

h^g- 1 ) ^ —d{e,g), since horofunctions are 1-Lipschitz and vanish at the origin. We get 

£(X e ) > [ d(e,L)dd £ (L)dp £ (f) - [ d(e, L) dd £ (L) dp e (£) — C 

J(L.Oerxir“ 1 u c ^(i.OerxTTg 1 !/ 

= (j d(e,L)dd £ (L)yp £ (nfi 1 U c ) - pfi^U)) - C. 
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For small enough e, we have p^Ttf^U) ^ 1/4 (and therefore p e (ixf^Lf 0 ) ^ 3/4). Moreover, 
Lemma 5.6 ensures that the average distance to the origin for the measure d £ is at least 
c • (ry/e) 1 / 2 . Hence, the previous formula completes the proof. □ 

Combining Lemmas 5.5 and 5.8, we get 

h(X £ )/£(\ £ ) st Clog{g/s)/(ri/s) 1/2 . 

This tends to 0 since p/e tends to infinity. We deduce from Lemma 5.1 that h(p £ ) / £{p e ) 
tends to 0. This is a contradiction since we were assuming that it converges to the maximum 
M, which is positive. 

This concludes the proof of Theorem 1.7. □ 

The study of the case where T s is virtually cyclic infinite gives in particular the following 
result. 

Theorem 5.9. Let (T, d) be a metric hyperbolic group. Let E be a finite subset of T which 
generates a non-elementary group. Let pi be a sequence of measures on S, with h(pi ) > 
0, converging to a probability measure p such that is infinite virtually cyclic. Then 

h(pi)/£(pi) 0. 

Note that the precise value of £{pf) depends on the choice of the distance, but if two 
distances are equivalent then the associated drifts vary within the same constants. Hence, 
the convergence h(pi)/£(pi ) —>• 0 does not depend on the distance. 

We recover results of Le Prince [LP07]: In any metric hyperbolic group, there exist 
admissible probability measures with h/£ < v. The construction of Le Prince is rather 
similar to the examples given by Theorem 5.9. 

Example 5.10. We can use the above proof to also find an example where h(p s )/£(p s ) —> 0 
although p £ tends to a measure p for which is finite and nontrivial. Consider T = 
Z/2 X F 2 = {0,1} X (a, b), endowed with the probability measure p £ given by 

P £ ( 0,e) = p £ ( l,e) = 1/2 -e-e 2 , p £ (0, a) =/i £ (0,a -1 ) = e, p £ (0,b) = p £ ( 0,6 _1 ) = e 2 . 

The measure p £ converges to p = (<J(o,e)+^(i,e))/2- With the above notations, = Z/2x{e} 
but T s = Z/2 x (a) is virtually cyclic infinite (so that h(p e )/£(p e ) —>• 0) and r. r = T. 

6. Examples for non-symmetric measures 

In this section, we describe the additional difficulties that arise if one tries to prove 
Theorem 1.3 for non-symmetric measures. The main problem is that the random walk lives 
on the subsemigroup r+, which is not a subgroup any more. While many cases can be 
handled with the tools we have described in this article, one case can not be treated in this 
way: when the subsemigroup T+ has no nice geometric properties (it is not quasi-convex, it 
is not a subgroup), but T^ = T. 

Let us first show that the growth properties of such a subsemigroup can be more com¬ 
plicated than what happens for subgroups. If A is a subgroup of T, either \B n fl A| x e nv , 
or | B n n A| = o(e nv ) (the first case happens if and only if A has finite index in T, see the 
discussion at the beginning of Paragraph 4.3). Unfortunately, the behavior of semigroups 
can be more complicated. 
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Proposition 6.1. In F 2 , there exists a subsemigroup A + such that lim inf |P n nA + |/|P n | = 0 
and lim sup| B n n A + |/|i? n | > 0. 

Proof. Let §™ a denote the geodesic words in F 2 = (a, b) of length n which start and end 
with a. Let rij be a sequence tending very quickly to infinity. Let A + be the subsemigroup 
generated by U§afa- Then | B nj n A + | ^ c\B nj \. We claim that 

\B nj -i n A + 1/1_ 1 1 > 0 . 

Indeed, the subsemigroup A~_j generated by Ufc<j ^a!a h as a growth rate which is < e nv , 
since some subwords such as are forbidden in this subsemigroup. Hence, if rij is large 
enough with respect to rij- 1 , we have |§ n -? _1 n A + | = |§ n > _1 n A +_, | = o ( e ^~^ v ). □ 

In this example, most points in § n ’ n A + are introduced by §a,a- This shows that A + is far 
from being quasi-convex. In particular, techniques based only on non-quasi-convexity and 
sub- or super-multiplicativity will never show that \B n D A + | = o(\B n \) for subsemigroups. 

Now, we give an example of a well-behaved measure (apart from the fact that it is not 
symmetric, not admissible and not finitely supported) for which h = Iv. The construction 
is done in free products. The idea is to forbid simplifications, so that we have an explicit 
control on the random walk at time n. To enforce this behavior, we will work in a free 
product Ti *T 2 , and consider a probability measure supported on elements of the form g\g -2 
with gi E Tj\{e}. The next statement applies to some non virtually free hyperbolic groups, 
for instance the free product of two surface groups. It also applies to some non-hyperbolic 
groups, more precisely to all finitely generated groups without torsion and with infinitely 
many ends, by Stallings’ theorem. It would be of interest to extend it to all groups with 
infinitely many ends. For this, we would need to also handle amalgamated free products 
and HNN extensions. 


Proposition 6.2. Let T\ and T 2 be two nontrivial groups, generated respectively by finite 
symmetric sets S\ and 62 . Let T = Ti * T 2 with the generating set S = Si U S 2 and the 
corresponding word distance. There exists onT a (nonsymmetric, nonadmissible) probability 
measure ji, with an exponential moment and nonzero entropy, satisfying h(g) = i{g)v. 


Proof. For i = 1, 2, let T* = Tj \ {e}. We claim that 
(6.1) e -v\9l92\ = 1; 


where v is the growth rate of T. 

Let Fjfz) be the growth series of T t , i.e., Ffiz) = ]T) 3gr . z ^ 9 L The spheres §" € F,; satisfy 
gn+m c g n . Hence, the sequence log|Sf | is subadditive. This implies that log|§”|/n 
converges to its infinrum Uj, and moreover that |S”| ^ e nVi . We deduce that the radius of 
convergence of F % is e~ Vi , and moreover Ffie~ Vi ) = + 00 . 

Let F(z) be the growth series of T. As in the proof of Proposition 4.8, it is given by 


F(z) 


F 1 (z)F 2 (z) 

1-( Fi (z)-1)(F 2 (z)-1Y 
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Assume for instance v\ V 2 ■ As F\(e ~ Vl ) = +oo, the function (Fi(z) — l)(i 7 2 (^) — 1) takes 
the value 1 when z increases to e ~ Vl , at a point which is precisely the radius of convergence 
e~ v of F. This shows that {F\(e~ v ) — l)(F 2 (e~ v ) — 1) = 1 . This is precisely the equality (6.1). 
We define a probability measure /r on T as follows: for ( 51 , 32 ) STJx T|, let 

I*(9i92)=e- V fo92\. 

Since there is only one way to generate the word g\g\ • • • gfglf using /j, we have 

Denoting by X n the position of the random walk at time n, it follows that — log p* n {X n ) = 
v|X n |. Dividing by n and letting n tend to infinity, this gives h{p) = £(p)v. □ 

If one is interested in measures with finite support, one can only get the following approx¬ 
imation result. It has the same flavor as Theorem 1.4, but it is both stronger since it also 
applies to some non-hyperbolic groups, and weaker since the measures it produces are not 
admissible nor symmetric. 

Proposition 6.3. Let Ti and T 2 be two nontrivial groups, generated respectively by finite 
symmetric sets S\ and S 2 . Let T = Ti * T 2 with the generating set S = Si U S 2 and the 
corresponding word distance. Then 

sup {h(n)/£(n) : fi finitely supported probability measure in T,£(p) > 0} = v. 

Proof. Any element in T can be canonically decomposed as a word in elements of Ti and 
r 2 . Let be the set of elements of length p that start with an element in L, and end with 
an element in L ? . We have the decomposition 

§ p = §? a u § p 1:2 u si^ u §* 2 . 

One term in this decomposition has cardinality at least |S^l/4. Hence, there exist i,j such 
that limsuplog|S^ |/p = v. Multiplying by fixed elements at the beginning and at the end 
to go from Ti to T,, and from Tj to T 2 , we get 

(6.2) limsuplog|S^ 2 \/p = v - 

Let fi p be the uniform probability measure on Sf 2 - construction, there are no simpli¬ 
fications when one iterates g p . Hence, p* n is the uniform probability measure on (§^ 2 )*”) 
whose cardinality is |§f 2 | n - § e ^ H(p* n ) = nlog|Sf 2 l an( i ^(^p”) = n P■ Therefore, 

Kup) = log I § 12 1 and £(p p ) = p, giving 

h(np)/£(n p ) = log | Si 2 1/p. 

Together with (6.2), this proves the proposition. □ 
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